pyspark.RDD.saveAsNewAPIHadoopDataset¶
-
RDD.
saveAsNewAPIHadoopDataset
(conf, keyConverter=None, valueConverter=None)[source]¶ Output a Python RDD of key-value pairs (of form
RDD[(K, V)]
) to any Hadoop file system, using the new Hadoop OutputFormat API (mapreduce package). Keys/values are converted for output using either user specified converters or, by default, “org.apache.spark.api.python.JavaToWritableConverter”.- Parameters
conf – Hadoop job configuration, passed in as a dict
keyConverter – (None by default)
valueConverter – (None by default)