pyspark.sql.functions.grouping_id

pyspark.sql.functions.grouping_id(*cols: ColumnOrName) → pyspark.sql.column.Column[source]

Aggregate function: returns the level of grouping, equals to

(grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + … + grouping(cn)

New in version 2.0.0.

Notes

The list of columns should match with grouping columns exactly, or empty (means all the grouping columns).

Examples

>>> df.cube("name").agg(grouping_id(), sum("age")).orderBy("name").show()
+-----+-------------+--------+
| name|grouping_id()|sum(age)|
+-----+-------------+--------+
| null|            1|       7|
|Alice|            0|       2|
|  Bob|            0|       5|
+-----+-------------+--------+