hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niv Mizrahi <n...@taykey.com>
Subject Re: map-reduce on a none closed files
Date Sun, 04 Mar 2012 17:56:16 GMT
hi harsh,

thank you for the quick response.
we are currently running with cdh3u2.

i have run map-reduces in many forms on non-closed files:
 1. streaming -mapper /bin/cat
 2. run word count
 3. run our own java job.

output parts are always empty, the jobs ended successfully.

running hadoop fs -cat on the same input return results.

am i doing something wrong ?

niv



On Sun, Mar 4, 2012 at 6:49 PM, Harsh J <harsh@cloudera.com> wrote:

> Technically, yes, you can run MR jobs on non-closed files (It'll run
> the reader in the same way as your -cat) , but your would only be able
> to read until the last complete block, or until the point sync() was
> called on the output stream.
>
> It is better if your file-writer uses the sync() API judiciously to
> mark sync points after a considerable amount of records, so that your
> MR readers in tasks read until whole records and not just block
> boundaries.
>
> For a description on sync() API, read the section 'Coherency Model' in
> Tom White's "Hadoop: The Definitive Guide" (O'Reilly), Page 68.
>
> On Sun, Mar 4, 2012 at 8:07 PM, Niv Mizrahi <nivm@taykey.com> wrote:
> > hi all,
> >
> >  we are looking for a way, to map-reduce on a non-closed files.
> >  we currently able to run a
> > hadoop fs -cat <non-closed-file>
> >
> > non-closed files - files that are currently been written, and have not
> been
> > closed yet.
> >
> > is there any way to run map-reduce a on non-closed files ??
> >
> >
> > 10x in advance for any answer
> > --
> > Niv Mizrahi
> > Taykey | www.taykey.com
> >
>
>
>
> --
> Harsh J
>



-- 
*Niv Mizrahi*
Taykey | www.taykey.com

Mime
View raw message