RDD.
reduceByKey
Merge the values for each key using an associative and commutative reduce function.
This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a “combiner” in MapReduce.
Output will be partitioned with numPartitions partitions, or the default parallelism level if numPartitions is not specified. Default partitioner is hash-partition.
New in version 1.6.0.
the reduce function
the number of partitions in new RDD
RDD
function to compute the partition index
a RDD containing the keys and the aggregated result for each key
See also
RDD.reduceByKeyLocally()
RDD.combineByKey()
RDD.aggregateByKey()
RDD.foldByKey()
RDD.groupByKey()
Examples
>>> from operator import add >>> rdd = sc.parallelize([("a", 1), ("b", 1), ("a", 1)]) >>> sorted(rdd.reduceByKey(add).collect()) [('a', 2), ('b', 1)]