pyspark.streaming.DStream

class pyspark.streaming.DStream(jdstream, ssc, jrdd_deserializer)[source]

A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous sequence of RDDs (of the same type) representing a continuous stream of data (see RDD in the Spark core documentation for more details on RDDs).

DStreams can either be created from live data (such as, data from TCP sockets, etc.) using a StreamingContext or it can be generated by transforming existing DStreams using operations such as map, window and reduceByKeyAndWindow. While a Spark Streaming program is running, each DStream periodically generates a RDD, either from live data or by transforming the RDD generated by a parent DStream.

DStreams internally is characterized by a few basic properties:
  • A list of other DStreams that the DStream depends on

  • A time interval at which the DStream generates an RDD

  • A function that is used to generate an RDD after each time interval

__init__(jdstream, ssc, jrdd_deserializer)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods