pyspark.sql.DataFrameStatFunctions.corr¶
- 
DataFrameStatFunctions.corr(col1: str, col2: str, method: Optional[str] = None) → float[source]¶
- Calculates the correlation of two columns of a - DataFrameas a double value. Currently only supports the Pearson Correlation Coefficient.- DataFrame.corr()and- DataFrameStatFunctions.corr()are aliases of each other.- New in version 1.4.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- col1str
- The name of the first column 
- col2str
- The name of the second column 
- methodstr, optional
- The correlation method. Currently only supports “pearson” 
 
- Returns
- float
- Pearson Correlation Coefficient of two columns. 
 
 - Examples - >>> df = spark.createDataFrame([(1, 12), (10, 1), (19, 8)], ["c1", "c2"]) >>> df.corr("c1", "c2") -0.3592106040535498 >>> df = spark.createDataFrame([(11, 12), (10, 11), (9, 10)], ["small", "bigger"]) >>> df.corr("small", "bigger") 1.0