hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj Das <d...@yahoo-inc.com>
Subject Re: setting a different input/output class for combiner function than map and reduce functions
Date Wed, 24 Sep 2008 09:24:29 GMT
If you are on 0.18, it is possible to say that a combiner be invoked once
per partition per spill. Do
job.setCombineOnlyOnce(true);
Or set the value of "mapred.combine.once" to true in your conf.



On 9/24/08 2:28 PM, "Palleti, Pallavi" <pallavi.palleti@corp.aol.com> wrote:

> Can it be possible to ensure that a combiner must run only once?
> 
> Thanks
> Pallavi
> 
> -----Original Message-----
> From: owen.omalley@gmail.com [mailto:owen.omalley@gmail.com] On Behalf Of Owen
> O'Malley
> Sent: Wednesday, September 24, 2008 6:42 AM
> To: core-user@hadoop.apache.org
> Subject: Re: setting a different input/output class for combiner function than
> map and reduce functions
> 
> On Tue, Sep 23, 2008 at 5:40 PM, Sandy <snickerdoodle08@gmail.com> wrote:
> 
>> 
>> I just wrote a combiner class to try and speed things up. However, now I
>> want to do something like the following:
>> ==map phase==
>> input: key = LongWritable value = Text,
>> output: key = Text, value = Longwritable
>> 
>> ==combiner==
>> input: key = Text, value = iterator<LongWritable>
>> output: key = Text, value = Text
> 
> 
> The input and output types for the combiner *must* be the same. The combiner
> may be applied 0, 1, or many times between the map and the reduce. So,
> combiners must be:
>   * not depend on being run exactly once
>   * not have side effects
> 
> InputFormat -> Map -> Combiner* -> Reduce -> OutputFormat
> 
> Since the Combiner may run more than once, it can't do type transformations.
> 
> -- Owen



Mime
View raw message