hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GUOJUN Zhu <guojun_...@freddiemac.com>
Subject Disable combiner in mapper while keep it in reducer
Date Thu, 31 May 2012 17:27:48 GMT

We are using the old API 0.20.2 of cloudera CDH3.  When I have the 
combiner set (just using the reducer class), it works both in the mapper 
and reducer.  In the mapper, it only aggregate a couple of records a time, 
while in the reducer, it aggregates 1000 a time.  The reducer has some 
overhead.  And this overhead is deteriorated and significant because a 
mapper task run reducer/combiner as many times as groups (# of different 
output keys) sequentially.  Can I turn it off in mapper while keep it on 

Zhu, Guojun
Modeling Sr Graduate
Financial Engineering
Freddie Mac
View raw message