hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xin Jing <xinj...@beyondfun.net>
Subject RE: problem when using combiner and MultipleOutputFormat
Date Fri, 28 Oct 2011 06:22:04 GMT
Thanks for your answer.

I am using different combiner and reducer. As I have said in previous mail, when the data
set is small, it works fine and the result is correct. I can tell the functionality of my
job is ok, right?

I cannot understand what do you mean by ' Do not output to files directly from your combiner',
could you give me more hints? I combiner code, I am using output.collect() to output my result,
do I misuse it?
________________________________________
From: Harsh J [harsh@cloudera.com]
Sent: Friday, October 28, 2011 2:11 PM
To: mapreduce-user@hadoop.apache.org
Subject: Re: problem when using combiner and MultipleOutputFormat

Xin,

You probably just need to write a special Combiner class instead of
reusing your Reducer class for combiner purposes. In an MR job, you
need to specifically guarantee that the combiner outputs the same type
of K-V pairs as the reducer's input. Do not output to files directly
from your combiner, and that is why you'd need a different class impl.
performing the optimization.

On Fri, Oct 28, 2011 at 10:04 AM, Xin Jing <xinjing@beyondfun.net> wrote:
>
> Hi all,
> I am currently encountering a tough problem, my job use MultipleOutputFormat
> to output result into different folder, and I have to use a combiner to
> enhance performance. In this situation, reduce does not work, reduce cannot
> receive any data. I searched this issue and found a related
> topic, http://lucene.472066.n3.nabble.com/Combiner-and-MultipleOutputs-in-Mapreduce-td1640503.html
,
> but not get clear what the solution is really. Seems it is the constraint of
> hadoop framework?
> I found a interesting phenomenon, when I limit the map input record to a
> small number (such as 10000), the reduce is ok, it can receive data and the
> result is correct. But when the input is over a million record, the reduce
> receive nothing. I guess the reason is the combiner only be called once when
> data is small while combiner be called multiple time when data is huge.
> To summary, how can I make combiner feasible  while using
> MultipleOutputFormat? Any solution or suggestion is welcome.
>
> Thanks
>



--
Harsh J



Mime
View raw message