hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "M. C. Srivas" <mcsri...@gmail.com>
Subject Re: Approached to combing the output of reducers
Date Sun, 24 Oct 2010 00:44:29 GMT
On Sat, Oct 23, 2010 at 4:19 PM, Steve Lewis <lordjoe2000@gmail.com> wrote:

> >
> > I am assuming the first job outputs multiple files and that the second
> (and
> > I presume a map-reduce job)
>
> will assign the output intended for a single file to a single reducer (in
> some cases multiple output files might be supported - one
> per reducer - On issue is how to allow the reducer to write to some
> 'external file system' -.i.e. not hdfs or  the instance's local file system
>  but s3 on amazon or some mounted nfs system on a stand alone cluster
>
>
     bin/hadoop jar  <jarname>  <input-dir>  <output-dir>

Thus.

    bin/hadoop jar  <jarname>   hdfs://...    file:///my/nfs/mounted/dir/...

will work, if you nfs-mount your destination dir on all the nodes in the
cluster.



> >
>
>
> > On Oct 23, 2010, at 3:32 PM, "M. C. Srivas" <mcsrivas@gmail.com> wrote:
> >
> > > Not with HDFS, since only one process may write to a single file (and
> its
> > > not visible until the file is closed). In fact, its worse than that ...
> > the
> > > same process that's writing that file cannot see it or read it until
> > after
> > > its done.
> > >
> > > If you have multiple reducers, you are simply out of luck and will have
> > to
> > > run a separate "job" to copy the data out.
> > >
> > >
> > > On Sat, Oct 23, 2010 at 3:08 PM, Steve Lewis <lordjoe2000@gmail.com>
> > wrote:
> > >
> > >> Once I run a map-reduce job I get output in the form of
> > >> part-r-00000 part-r-00001 ...
> > >>
> > >> In many cases the output is significantly smaller than the original
> > input -
> > >> take the classic word count
> > >>
> > >> In most cases I want to combine the output into a single file that may
> > well
> > >> not live on HDFS but on a more accessible file system
> > >>
> > >> Are there standard libraries or approaches for consolidating reducer
> > >> output.
> > >>
> > >> A second Map-Reduce job taking the output directory as an input is an
> OK
> > >> start but as output there needs to be a single reducer that
> > >> writes a real file and not reduce output -
> > >>
> > >> Are there standard libraries or approaches to this?????
> > >>
> > >> --
> > >> Steven M. Lewis PhD
> > >> 4221 105th Ave Ne
> > >> Kirkland, WA 98033
> > >> 206-384-1340 (cell)
> > >> Institute for Systems Biology
> > >> Seattle WA
> > >>
> >
>
>
>
> --
> Steven M. Lewis PhD
> 4221 105th Ave Ne
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Institute for Systems Biology
> Seattle WA
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message