hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ShiYu <sh...@uchicago.edu>
Subject Combiner and MultipleOutputs in Mapreduce
Date Wed, 06 Oct 2010 04:44:08 GMT

Hi,

most of the example code I read has the following configuration (using the
same Reduce class as the Combiner and the Reducer)

conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class); 

To let the combiner and reducer run, does it require that the input <K2,V2>
of Reduce class and the output <K2,V2> should be the same? I guess otherwise
the types will NOT be incompatible? In the current version, is it possible
to have more complicated Combiner and Reducer, such as supporting <K2,V2> as
input and <K3,V3> as output? 

However, after I tried some simple experiments when using MultipleOutputs, I
found that if the Combiner class is set, the Reducer would never be invoked.
I am using Hadoop 0.19.2 package. It seems that the MultipleOutputs object
robs away the output of combiner so the Reducer cannot get the input. The
default logs of program indicate "Reduce input records=0" and "Reduce output
records=0", moreover, the output files are the same number of the input
files. Also in the Combiner only has input record, but no output thus
"Combine output records=0".  My question is when using MultipleOutputs
object, how to invoke the data flow between the Combiner and the Reducer?

Thanks for any suggestion.

Shi





-- 
View this message in context: http://old.nabble.com/Combiner-and-MultipleOutputs-in-Mapreduce-tp29893459p29893459.html
Sent from the Hadoop core-dev mailing list archive at Nabble.com.


Mime
View raw message