pyspark.sql.functions.approx_count_distinct¶
-
pyspark.sql.functions.
approx_count_distinct
(col, rsd=None)[source]¶ Aggregate function: returns a new
Column
for approximate distinct count of column col.- Parameters
rsd – maximum estimation error allowed (default = 0.05). For rsd < 0.01, it is more efficient to use
countDistinct()
>>> df.agg(approx_count_distinct(df.age).alias('distinct_ages')).collect() [Row(distinct_ages=2)]
New in version 2.1.