hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Twensky" <jim.twen...@gmail.com>
Subject Re: Combiner run specification and questions
Date Fri, 02 Jan 2009 20:00:41 GMT
Hello Saptarshi,

>>E.g if there are only 10 value corresponding
>>to a key(as outputted by the mapper), will these 10 values go straight
>>to the reducer or to the reducer via the combiner?

It depends on whether or not you use the method JobConf.setCombinerClass()
or not. If you don't, Hadoop  does not run any combiners by default. If you
use your reducer class as the combiner, you must make sure that your mapper
and reducer outputs are of same type. Because otherwise you will get a
runtime error about types not matching. In your case, I strongly recommend
you to use a combiner to reduce the size of the intermediate data. My
understanding is that, combiners are just local reducers that run right
after the completion of the map step.

Jim

On Fri, Jan 2, 2009 at 11:57 AM, Saptarshi Guha <saptarshi.guha@gmail.com>wrote:

> Hello,
> I would just like to confirm, when does the Combiner run(since it
> might not be run at all,see below). I read somewhere that it is run,
> if there is at least one reduce (which in my case i can be sure of).
> I also read, that the combiner is an optimization. However, it is also
> a chance for a function to transform the key/value (keeping the class
> the same i.e the combiner semantics are not changed) and deal with a
> smaller set ( this could be done in the reducer but the number of
> values for a key might be relatively large).
>
> However, I guess it would be a mistake for reducer to expect its input
> coming from a combiner? E.g if there are only 10 value corresponding
> to a key(as outputted by the mapper), will these 10 values go straight
> to the reducer or to the reducer via the combiner?
>
> Here I am assuming my reduce operations does not need all the values
> for a key to work(so that a combiner can be used) i.e additive
> operations.
>
> Thank you
> Saptarshi
>
>
> On Sun, Nov 16, 2008 at 6:18 PM, Owen O'Malley <omalley@apache.org> wrote:
> > The Combiner may be called 0, 1, or many times on each key between the
> > mapper and reducer. Combiners are just an application specific
> optimization
> > that compress the intermediate output. They should not have side effects
> or
> > transform the types. Unfortunately, since there isn't a separate
> interface
> > for Combiners, there is isn't a great place to document this requirement.
> > I've just filed HADOOP-4668 to improve the documentation.
>
>
>
> --
> Saptarshi Guha - saptarshi.guha@gmail.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message