pyspark.sql.DataFrameWriter.json¶
-
DataFrameWriter.
json
(path, mode=None, compression=None, dateFormat=None, timestampFormat=None, lineSep=None, encoding=None, ignoreNullFields=None)[source]¶ Saves the content of the
DataFrame
in JSON format (JSON Lines text format or newline-delimited JSON) at the specified path.- Parameters
path – the path in any Hadoop supported file system
mode –
specifies the behavior of the save operation when data already exists.
append
: Append contents of thisDataFrame
to existing data.overwrite
: Overwrite existing data.ignore
: Silently ignore this operation if data already exists.error
orerrorifexists
(default case): Throw an exception if data already exists.
compression – compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, bzip2, gzip, lz4, snappy and deflate).
dateFormat – sets the string that indicates a date format. Custom date formats follow the formats at `datetime pattern`_. This applies to date type. If None is set, it uses the default value,
yyyy-MM-dd
.timestampFormat – sets the string that indicates a timestamp format. Custom date formats follow the formats at `datetime pattern`_. This applies to timestamp type. If None is set, it uses the default value,
yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]
.encoding – specifies encoding (charset) of saved json files. If None is set, the default UTF-8 charset will be used.
lineSep – defines the line separator that should be used for writing. If None is set, it uses the default value,
\n
.ignoreNullFields – Whether to ignore null fields when generating JSON objects. If None is set, it uses the default value,
true
.
>>> df.write.json(os.path.join(tempfile.mkdtemp(), 'data'))
New in version 1.4.