pyspark.streaming.DStream.countByValueAndWindow

DStream.countByValueAndWindow(windowDuration, slideDuration, numPartitions=None)[source]

Return a new DStream in which each RDD contains the count of distinct elements in RDDs in a sliding window over this DStream.

Parameters
  • windowDuration – width of the window; must be a multiple of this DStream’s batching interval

  • slideDuration – sliding interval of the window (i.e., the interval after which the new DStream will generate RDDs); must be a multiple of this DStream’s batching interval

  • numPartitions – number of partitions of each RDD in the new DStream.