pyspark.RDD.saveAsHadoopDataset

RDD.saveAsHadoopDataset(conf, keyConverter=None, valueConverter=None)[source]

Output a Python RDD of key-value pairs (of form RDD[(K, V)]) to any Hadoop file system, using the old Hadoop OutputFormat API (mapred package). Keys/values are converted for output using either user specified converters or, by default, “org.apache.spark.api.python.JavaToWritableConverter”.

Parameters
  • conf – Hadoop job configuration, passed in as a dict

  • keyConverter – (None by default)

  • valueConverter – (None by default)