pyspark.sql.DataFrame.unionByName¶
-
DataFrame.
unionByName
(other)[source]¶ Returns a new
DataFrame
containing union of rows in this and anotherDataFrame
.This is different from both UNION ALL and UNION DISTINCT in SQL. To do a SQL-style set union (that does deduplication of elements), use this function followed by
distinct()
.The difference between this function and
union()
is that this function resolves columns by name (not by position):>>> df1 = spark.createDataFrame([[1, 2, 3]], ["col0", "col1", "col2"]) >>> df2 = spark.createDataFrame([[4, 5, 6]], ["col1", "col2", "col0"]) >>> df1.unionByName(df2).show() +----+----+----+ |col0|col1|col2| +----+----+----+ | 1| 2| 3| | 6| 4| 5| +----+----+----+
New in version 2.3.