pyspark.RDD.top¶
-
RDD.
top
(num, key=None)[source]¶ Get the top N elements from an RDD.
Note
This method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver’s memory.
Note
It returns the list sorted in descending order.
>>> sc.parallelize([10, 4, 2, 12, 3]).top(1) [12] >>> sc.parallelize([2, 3, 4, 5, 6], 2).top(2) [6, 5] >>> sc.parallelize([10, 4, 2, 12, 3]).top(3, key=str) [4, 3, 2]