hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Lewis <lordjoe2...@gmail.com>
Subject Re: Approached to combing the output of reducers
Date Sat, 23 Oct 2010 23:19:10 GMT
>
> I am assuming the first job outputs multiple files and that the second (and
> I presume a map-reduce job)

will assign the output intended for a single file to a single reducer (in
some cases multiple output files might be supported - one
per reducer - On issue is how to allow the reducer to write to some
'external file system' -.i.e. not hdfs or  the instance's local file system
 but s3 on amazon or some mounted nfs system on a stand alone cluster

>


> On Oct 23, 2010, at 3:32 PM, "M. C. Srivas" <mcsrivas@gmail.com> wrote:
>
> > Not with HDFS, since only one process may write to a single file (and its
> > not visible until the file is closed). In fact, its worse than that ...
> the
> > same process that's writing that file cannot see it or read it until
> after
> > its done.
> >
> > If you have multiple reducers, you are simply out of luck and will have
> to
> > run a separate "job" to copy the data out.
> >
> >
> > On Sat, Oct 23, 2010 at 3:08 PM, Steve Lewis <lordjoe2000@gmail.com>
> wrote:
> >
> >> Once I run a map-reduce job I get output in the form of
> >> part-r-00000 part-r-00001 ...
> >>
> >> In many cases the output is significantly smaller than the original
> input -
> >> take the classic word count
> >>
> >> In most cases I want to combine the output into a single file that may
> well
> >> not live on HDFS but on a more accessible file system
> >>
> >> Are there standard libraries or approaches for consolidating reducer
> >> output.
> >>
> >> A second Map-Reduce job taking the output directory as an input is an OK
> >> start but as output there needs to be a single reducer that
> >> writes a real file and not reduce output -
> >>
> >> Are there standard libraries or approaches to this?????
> >>
> >> --
> >> Steven M. Lewis PhD
> >> 4221 105th Ave Ne
> >> Kirkland, WA 98033
> >> 206-384-1340 (cell)
> >> Institute for Systems Biology
> >> Seattle WA
> >>
>



-- 
Steven M. Lewis PhD
4221 105th Ave Ne
Kirkland, WA 98033
206-384-1340 (cell)
Institute for Systems Biology
Seattle WA

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message