pyspark.sql.DataFrameWriter.partitionBy¶
-
DataFrameWriter.
partitionBy
(*cols)[source]¶ Partitions the output by the given columns on the file system.
If specified, the output is laid out on the file system similar to Hive’s partitioning scheme.
- Parameters
cols – name of columns
>>> df.write.partitionBy('year', 'month').parquet(os.path.join(tempfile.mkdtemp(), 'data'))
New in version 1.4.