hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Yamijala <yhema...@thoughtworks.com>
Subject Re: map reduce and sync
Date Sat, 23 Feb 2013 14:54:15 GMT
Hi Lucas,

I tried something like this but got different results.

I wrote code that opened a file on HDFS, wrote a line and called sync.
Without closing the file, I ran a wordcount with that file as input. It did
work fine and was able to count the words that were sync'ed (even though
the file length seems to come as 0 like you noted in fs -ls)

So, not sure what's happening in your case. In the MR job, do the job
counters indicate no bytes were read ?

On a different note though, if you can describe a little more what you are
trying to accomplish, we could probably work a better solution.


On Sat, Feb 23, 2013 at 7:15 PM, Lucas Bernardi <lucejb@gmail.com> wrote:

> Helo Hemanth, thanks for answering.
> The file is open by a separate process not map reduce related at all. You
> can think of it as a servlet, receiving requests, and writing them to this
> file, every time a request is received it is written and
> org.apache.hadoop.fs.FSDataOutputStream.sync() is invoked.
> At the same time, I want to run a map reduce job over this file. Simply
> runing the word count example doesn't seem to work, it is like if the file
> were empty.
> hadoop -fs -tail works just fine, and reading the file using
> org.apache.hadoop.fs.FSDataInputStream also works ok.
> Last thing, the web interface doesn't see the contents, and command hadoop
> -fs -ls says the file is empty.
> What am I doing wrong?
> Thanks!
> Lucas
> On Sat, Feb 23, 2013 at 4:37 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>> Could you please clarify, are you opening the file in your mapper code
>> and reading from there ?
>> Thanks
>> Hemanth
>> On Friday, February 22, 2013, Lucas Bernardi wrote:
>>> Hello there, I'm trying to use hadoop map reduce to process an open
>>> file. The writing process, writes a line to the file and syncs the file
>>> to readers.
>>> (org.apache.hadoop.fs.FSDataOutputStream.sync()).
>>> If I try to read the file from another process, it works fine, at least
>>> using
>>> org.apache.hadoop.fs.FSDataInputStream.
>>> hadoop -fs -tail also works just fine
>>> But it looks like map reduce doesn't read any data. I tried using the
>>> word count example, same thing, it is like if the file were empty for the
>>> map reduce framework.
>>> I'm using hadoop 1.0.3. and pig 0.10.0
>>> I need some help around this.
>>> Thanks!
>>> Lucas

View raw message