pyspark.pandas.groupby.GroupBy.diff¶

GroupBy.diff(periods: int = 1) → FrameLike[source]¶

First discrete difference of element.

Calculates the difference of a DataFrame element compared with another element in the DataFrame group (default is the element in the same column of the previous row).

Parameters

periodsint, default 1: Periods to shift for calculating difference, accepts negative values.

Returns

diffedDataFrame or Series

See also

pyspark.pandas.Series.groupby
pyspark.pandas.DataFrame.groupby

Examples

>>> df = ps.DataFrame({'a': [1, 2, 3, 4, 5, 6],
...                    'b': [1, 1, 2, 3, 5, 8],
...                    'c': [1, 4, 9, 16, 25, 36]}, columns=['a', 'b', 'c'])
>>> df
   a  b   c
0  1  1   1
1  2  1   4
2  3  2   9
3  4  3  16
4  5  5  25
5  6  8  36

>>> df.groupby(['b']).diff().sort_index()
     a    c
NaN  NaN
1.0  3.0
NaN  NaN
NaN  NaN
NaN  NaN
NaN  NaN

Difference with previous column in a group.

>>> df.groupby(['b'])['a'].diff().sort_index()
  NaN
  1.0
  NaN
  NaN
  NaN
  NaN
Name: a, dtype: float64

pyspark.pandas.groupby.GroupBy.size pyspark.pandas.groupby.GroupBy.idxmax