hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niv Mizrahi <n...@taykey.com>
Subject Re: map-reduce on a none closed files
Date Mon, 05 Mar 2012 21:28:20 GMT
hi harsh,

yes thank you, we are using sync() API, and still unable to read unclosed
files in mapreduce.
we are able to cat non-closed files, was it possible if we haven't use sync
API() call?

have anybody tried ruing a M/R on a non-closed files ?
are we missing something ?

10x
Niv



On Mon, Mar 5, 2012 at 3:42 PM, Harsh J <harsh@cloudera.com> wrote:

> Niv,
>
> Did you also try the sync() approach I mentioned? Did that not work?
> CDH3u2 does have the sync() API in it, so you can use it right away.
>
> On Sun, Mar 4, 2012 at 11:26 PM, Niv Mizrahi <nivm@taykey.com> wrote:
> > hi harsh,
> >
> > thank you for the quick response.
> > we are currently running with cdh3u2.
> >
> > i have run map-reduces in many forms on non-closed files:
> >  1. streaming -mapper /bin/cat
> >  2. run word count
> >  3. run our own java job.
> >
> > output parts are always empty, the jobs ended successfully.
> >
> > running hadoop fs -cat on the same input return results.
> >
> > am i doing something wrong ?
> >
> > niv
> >
> >
> >
> > On Sun, Mar 4, 2012 at 6:49 PM, Harsh J <harsh@cloudera.com> wrote:
> >>
> >> Technically, yes, you can run MR jobs on non-closed files (It'll run
> >> the reader in the same way as your -cat) , but your would only be able
> >> to read until the last complete block, or until the point sync() was
> >> called on the output stream.
> >>
> >> It is better if your file-writer uses the sync() API judiciously to
> >> mark sync points after a considerable amount of records, so that your
> >> MR readers in tasks read until whole records and not just block
> >> boundaries.
> >>
> >> For a description on sync() API, read the section 'Coherency Model' in
> >> Tom White's "Hadoop: The Definitive Guide" (O'Reilly), Page 68.
> >>
> >> On Sun, Mar 4, 2012 at 8:07 PM, Niv Mizrahi <nivm@taykey.com> wrote:
> >> > hi all,
> >> >
> >> >  we are looking for a way, to map-reduce on a non-closed files.
> >> >  we currently able to run a
> >> > hadoop fs -cat <non-closed-file>
> >> >
> >> > non-closed files - files that are currently been written, and have not
> >> > been
> >> > closed yet.
> >> >
> >> > is there any way to run map-reduce a on non-closed files ??
> >> >
> >> >
> >> > 10x in advance for any answer
> >> > --
> >> > Niv Mizrahi
> >> > Taykey | www.taykey.com
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
> >
> >
> > --
> > Niv Mizrahi
> > Taykey | www.taykey.com
> >
>
>
>
> --
> Harsh J
>



-- 
*Niv Mizrahi*
Taykey | www.taykey.com

Mime
View raw message