pyspark.sql.DataFrame.distinct¶
-
DataFrame.distinct() → pyspark.sql.dataframe.DataFrame[source]¶ Returns a new
DataFramecontaining the distinct rows in thisDataFrame.New in version 1.3.0.
Changed in version 3.4.0: Supports Spark Connect.
- Returns
DataFrameDataFrame with distinct records.
Examples
>>> df = spark.createDataFrame( ... [(14, "Tom"), (23, "Alice"), (23, "Alice")], ["age", "name"])
Return the number of distinct rows in the
DataFrame>>> df.distinct().count() 2