hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Diwakar <ddeepa...@gmail.com>
Subject Re: Multiple final reduced outputs
Date Wed, 28 Jul 2010 19:56:41 GMT
Yep Harsh. I was doing the same just wondering why not we have option at
master to combine them into a single file. That could be a feature( and if
its there please let me know ). Similar to setting reduce class to job ,we
may set a merger/master-combiner to that class  into code itself.

Also thanks David and chaitanya for putting your pointers.  Actually i was
more of wondering about having an in-build option to marge after collecting
all reduced outputs .

Thanks & regards
- Deepak Diwakar,




On 29 July 2010 01:12, Harsh J <qwertymaniac@gmail.com> wrote:

> Concatenating them is the easiest way to get the result back as a
> single file (its grouped/sorted anyway). For files that can't exactly
> be 'cat' together (headers, etc.), you may run your job with an
> explicit number of Reducers (or write special tools for such cases,
> cause else the limited number of reducers may impact the processing
> time).
>
> JobConf.setNumReduceTasks(int n); before submitting the job should do it.
>
> In case you've doubts about what 'merge' really means in the
> map-to-intermediate-to-reduce phases, this guide should explain it
> very well: http://wiki.apache.org/hadoop/HadoopMapReduce
>
> On Thu, Jul 29, 2010 at 12:57 AM, David Pellegrini
> <davidp@datawebsystems.com> wrote:
> > Perhaps I'm missing some subtlety, but that's what I would expect.  2
> > reducer nodes -> 2 outputs.  If you need them in one big file, cat them
> > together.
> >
> > my 2 cents
> >
> > David
> >
> > On 07/28/2010 12:16 PM, Deepak Diwakar wrote:
> >>
> >> I have setup 2 node clusters and ran many jobs including wordcount.  In
> >> all
> >> the output folders i am getting two mutual exclusive output files as
> >> part-00000 and part-00001 instead of single output. A merging should
> take
> >> place to get into one single output file which is not occurring here .
> >>
> >> Could someone point me out where i am going wrong?
> >>
> >> Thanks&  regards
> >> - Deepak Diwakar,
> >>
> >>
> >
>
>
>
> --
> Harsh J
> www.harshj.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message