hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Saptarshi Guha" <saptarshi.g...@gmail.com>
Subject Combiner run specification and questions
Date Fri, 02 Jan 2009 17:57:01 GMT
I would just like to confirm, when does the Combiner run(since it
might not be run at all,see below). I read somewhere that it is run,
if there is at least one reduce (which in my case i can be sure of).
I also read, that the combiner is an optimization. However, it is also
a chance for a function to transform the key/value (keeping the class
the same i.e the combiner semantics are not changed) and deal with a
smaller set ( this could be done in the reducer but the number of
values for a key might be relatively large).

However, I guess it would be a mistake for reducer to expect its input
coming from a combiner? E.g if there are only 10 value corresponding
to a key(as outputted by the mapper), will these 10 values go straight
to the reducer or to the reducer via the combiner?

Here I am assuming my reduce operations does not need all the values
for a key to work(so that a combiner can be used) i.e additive

Thank you

On Sun, Nov 16, 2008 at 6:18 PM, Owen O'Malley <omalley@apache.org> wrote:
> The Combiner may be called 0, 1, or many times on each key between the
> mapper and reducer. Combiners are just an application specific optimization
> that compress the intermediate output. They should not have side effects or
> transform the types. Unfortunately, since there isn't a separate interface
> for Combiners, there is isn't a great place to document this requirement.
> I've just filed HADOOP-4668 to improve the documentation.

Saptarshi Guha - saptarshi.guha@gmail.com

View raw message