pyspark.streaming.DStream.groupByKeyAndWindow¶
-
DStream.
groupByKeyAndWindow
(windowDuration, slideDuration, numPartitions=None)[source]¶ Return a new DStream by applying groupByKey over a sliding window. Similar to DStream.groupByKey(), but applies it over a sliding window.
- Parameters
windowDuration – width of the window; must be a multiple of this DStream’s batching interval
slideDuration – sliding interval of the window (i.e., the interval after which the new DStream will generate RDDs); must be a multiple of this DStream’s batching interval
numPartitions – Number of partitions of each RDD in the new DStream.