pyspark.RDD¶
-
class
pyspark.
RDD
(jrdd, ctx, jrdd_deserializer=AutoBatchedSerializer(PickleSerializer()))[source]¶ A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable, partitioned collection of elements that can be operated on in parallel.
-
__init__
(jrdd, ctx, jrdd_deserializer=AutoBatchedSerializer(PickleSerializer()))[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
Attributes
-