hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mu Qiao <qiao...@gmail.com>
Subject Re: Combiner Problem
Date Tue, 07 Jul 2009 04:04:35 GMT
It's clear now. Thank you very much.

2009/7/7 Owen O'Malley <omalley@apache.org>

> On Jul 5, 2009, at 11:34 PM, Mu Qiao wrote:
>  There is a property min.num.spills.for.combine specifying the minimum
>> number of spills to run combiner when merging. The default value is 3. Why
>> there is such a restriction? Should it be better that run the combiner no
>> matter how many spills there are?
> Clearly the combiner isn't useful if there is only 1 spill and 3 is a guess
> about how many are necessary before the cost of the applying the combiner is
> paid for by the resulting compression. Feel free to set it to 2.
>  The second question is why the combiner could be run at the reduce side.
>> Can't the reduce function take place of that?
> The combiners are only called on the reduce side only if there are enough
> spills that it requires more than a single merge before it can go to the
> reduce. (The reduce is only called once at the end.) So if the reduce has
> 1000 streams to merge, it will use the combiner on the intermediate merges
> before they are written to disk.
> -- Owen

Best wishes,
Qiao Mu
MOE KLINNS Lab and SKLMS Lab, Xi'an Jiaotong University
Department of Computer Science and Technology, Xi’an Jiaotong University
TEL: 15991676983
E-mail: qiaomuf@gmail.com

View raw message