hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Scheidtmann <jens.scheidtm...@gmail.com>
Subject Re: Difference between combiner and aggregator
Date Sat, 06 Apr 2013 20:11:45 GMT
Dear jamal sasha,

The usual example goes like this:

class Mapper
  method MAP (Line l)
     document <- split l in Terms t
     for all Terms t in document
        EMIT(Term t, one)


class Combiner
  method REDUCE(Term t, List of Counts lc)
     cnt <- sum lc
     EMIT(Term t, Count cnt)

class Reducer
   method REDUCE(Term t, List of Counts lc)
      cnt <- sum lc
      EMIT(Term t, Count cnt)


The combiner is run node local on mapper output (before the shuffle). It's
output is used as input to the reducers (after the shuffle). A combiner is
an I/O optimization. There are no guarantees by the framework, if a
combiner will be called at all, one or more times on the output.

Best regards,

Jens

Mime
View raw message