pyspark.pandas.Series.value_counts¶
- 
Series.value_counts(normalize: bool = False, sort: bool = True, ascending: bool = False, bins: None = None, dropna: bool = True) → Series¶
- Return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default. - Parameters
- normalizeboolean, default False
- If True then the object returned will contain the relative frequencies of the unique values. 
- sortboolean, default True
- Sort by values. 
- ascendingboolean, default False
- Sort in ascending order. 
- binsNot Yet Supported
- dropnaboolean, default True
- Don’t include counts of NaN. 
 
- Returns
- countsSeries
 
 - See also - Series.count
- Number of non-NA elements in a Series. 
 - Examples - For Series - >>> df = ps.DataFrame({'x':[0, 0, 1, 1, 1, np.nan]}) >>> df.x.value_counts() 1.0 3 0.0 2 Name: x, dtype: int64 - With normalize set to True, returns the relative frequency by dividing all values by the sum of values. - >>> df.x.value_counts(normalize=True) 1.0 0.6 0.0 0.4 Name: x, dtype: float64 - dropna With dropna set to False we can also see NaN index values. - >>> df.x.value_counts(dropna=False) 1.0 3 0.0 2 NaN 1 Name: x, dtype: int64 - For Index - >>> idx = ps.Index([3, 1, 2, 3, 4, np.nan]) >>> idx Float64Index([3.0, 1.0, 2.0, 3.0, 4.0, nan], dtype='float64') - >>> idx.value_counts().sort_index() 1.0 1 2.0 1 3.0 2 4.0 1 dtype: int64 - sort - With sort set to False, the result wouldn’t be sorted by number of count. - >>> idx.value_counts(sort=True).sort_index() 1.0 1 2.0 1 3.0 2 4.0 1 dtype: int64 - normalize - With normalize set to True, returns the relative frequency by dividing all values by the sum of values. - >>> idx.value_counts(normalize=True).sort_index() 1.0 0.2 2.0 0.2 3.0 0.4 4.0 0.2 dtype: float64 - dropna - With dropna set to False we can also see NaN index values. - >>> idx.value_counts(dropna=False).sort_index() 1.0 1 2.0 1 3.0 2 4.0 1 NaN 1 dtype: int64 - For MultiIndex. - >>> midx = pd.MultiIndex([['lama', 'cow', 'falcon'], ... ['speed', 'weight', 'length']], ... [[0, 0, 0, 1, 1, 1, 2, 2, 2], ... [1, 1, 1, 1, 1, 2, 1, 2, 2]]) >>> s = ps.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3], index=midx) >>> s.index MultiIndex([( 'lama', 'weight'), ( 'lama', 'weight'), ( 'lama', 'weight'), ( 'cow', 'weight'), ( 'cow', 'weight'), ( 'cow', 'length'), ('falcon', 'weight'), ('falcon', 'length'), ('falcon', 'length')], ) - >>> s.index.value_counts().sort_index() (cow, length) 1 (cow, weight) 2 (falcon, length) 2 (falcon, weight) 1 (lama, weight) 3 dtype: int64 - >>> s.index.value_counts(normalize=True).sort_index() (cow, length) 0.111111 (cow, weight) 0.222222 (falcon, length) 0.222222 (falcon, weight) 0.111111 (lama, weight) 0.333333 dtype: float64 - If Index has name, keep the name up. - >>> idx = ps.Index([0, 0, 0, 1, 1, 2, 3], name='pandas-on-Spark') >>> idx.value_counts().sort_index() 0 3 1 2 2 1 3 1 Name: pandas-on-Spark, dtype: int64