hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lucas Bernardi <luc...@gmail.com>
Subject Re: map reduce and sync
Date Sat, 23 Feb 2013 13:45:51 GMT
Helo Hemanth, thanks for answering.
The file is open by a separate process not map reduce related at all. You
can think of it as a servlet, receiving requests, and writing them to this
file, every time a request is received it is written and
org.apache.hadoop.fs.FSDataOutputStream.sync() is invoked.

At the same time, I want to run a map reduce job over this file. Simply
runing the word count example doesn't seem to work, it is like if the file
were empty.

hadoop -fs -tail works just fine, and reading the file using
org.apache.hadoop.fs.FSDataInputStream also works ok.

Last thing, the web interface doesn't see the contents, and command hadoop
-fs -ls says the file is empty.

What am I doing wrong?



On Sat, Feb 23, 2013 at 4:37 AM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Could you please clarify, are you opening the file in your mapper code and
> reading from there ?
> Thanks
> Hemanth
> On Friday, February 22, 2013, Lucas Bernardi wrote:
>> Hello there, I'm trying to use hadoop map reduce to process an open file.
>> The writing process, writes a line to the file and syncs the file to
>> readers.
>> (org.apache.hadoop.fs.FSDataOutputStream.sync()).
>> If I try to read the file from another process, it works fine, at least
>> using
>> org.apache.hadoop.fs.FSDataInputStream.
>> hadoop -fs -tail also works just fine
>> But it looks like map reduce doesn't read any data. I tried using the
>> word count example, same thing, it is like if the file were empty for the
>> map reduce framework.
>> I'm using hadoop 1.0.3. and pig 0.10.0
>> I need some help around this.
>> Thanks!
>> Lucas

View raw message