hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: Multiple final reduced outputs
Date Wed, 28 Jul 2010 19:42:25 GMT
Concatenating them is the easiest way to get the result back as a
single file (its grouped/sorted anyway). For files that can't exactly
be 'cat' together (headers, etc.), you may run your job with an
explicit number of Reducers (or write special tools for such cases,
cause else the limited number of reducers may impact the processing
time).

JobConf.setNumReduceTasks(int n); before submitting the job should do it.

In case you've doubts about what 'merge' really means in the
map-to-intermediate-to-reduce phases, this guide should explain it
very well: http://wiki.apache.org/hadoop/HadoopMapReduce

On Thu, Jul 29, 2010 at 12:57 AM, David Pellegrini
<davidp@datawebsystems.com> wrote:
> Perhaps I'm missing some subtlety, but that's what I would expect.  2
> reducer nodes -> 2 outputs.  If you need them in one big file, cat them
> together.
>
> my 2 cents
>
> David
>
> On 07/28/2010 12:16 PM, Deepak Diwakar wrote:
>>
>> I have setup 2 node clusters and ran many jobs including wordcount.  In
>> all
>> the output folders i am getting two mutual exclusive output files as
>> part-00000 and part-00001 instead of single output. A merging should take
>> place to get into one single output file which is not occurring here .
>>
>> Could someone point me out where i am going wrong?
>>
>> Thanks&  regards
>> - Deepak Diwakar,
>>
>>
>



-- 
Harsh J
www.harshj.com

Mime
View raw message