hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lucas Bernardi <luc...@gmail.com>
Subject Re: map reduce and sync
Date Sat, 23 Feb 2013 22:03:02 GMT
That is exactly what I did, but in my case, it is like if the file were
empty, the job counters say no bytes read.
I'm using hadoop 1.0.3 which version did you try?

What I'm trying to do is just some basic analyitics on a product search
system. There is a search service, every time a user performs a search, the
search string, and the results are stored in this file, and the file is
sync'ed. I'm actually using pig to do some basic counts, it doesn't work,
like I described, because the file looks empty for the map reduce
components. I thought it was about pig, but I wasn't sure, so I tried a
simple mr job, and used the word count to test the map reduce compoinents
actually see the sync'ed bytes.

Of course if I close the file, everything works perfectly, but I don't want
to close the file every while, since that means I should create another one
(since no append support), and that would end up with too many tiny files,
something we know is bad for mr performance, and I don't want to add more
parts to this (like a file merging tool). I think unign sync is a clean
solution, since we don't care about writing performance, so I'd rather keep
it like this if I can make it work.

Any idea besides hadoop version?



On Sat, Feb 23, 2013 at 11:54 AM, Hemanth Yamijala <
yhemanth@thoughtworks.com> wrote:

> Hi Lucas,
> I tried something like this but got different results.
> I wrote code that opened a file on HDFS, wrote a line and called sync.
> Without closing the file, I ran a wordcount with that file as input. It did
> work fine and was able to count the words that were sync'ed (even though
> the file length seems to come as 0 like you noted in fs -ls)
> So, not sure what's happening in your case. In the MR job, do the job
> counters indicate no bytes were read ?
> On a different note though, if you can describe a little more what you are
> trying to accomplish, we could probably work a better solution.
> Thanks
> hemanth
> On Sat, Feb 23, 2013 at 7:15 PM, Lucas Bernardi <lucejb@gmail.com> wrote:
>> Helo Hemanth, thanks for answering.
>> The file is open by a separate process not map reduce related at all. You
>> can think of it as a servlet, receiving requests, and writing them to this
>> file, every time a request is received it is written and
>> org.apache.hadoop.fs.FSDataOutputStream.sync() is invoked.
>> At the same time, I want to run a map reduce job over this file. Simply
>> runing the word count example doesn't seem to work, it is like if the file
>> were empty.
>> hadoop -fs -tail works just fine, and reading the file using
>> org.apache.hadoop.fs.FSDataInputStream also works ok.
>> Last thing, the web interface doesn't see the contents, and command
>> hadoop -fs -ls says the file is empty.
>> What am I doing wrong?
>> Thanks!
>> Lucas
>> On Sat, Feb 23, 2013 at 4:37 AM, Hemanth Yamijala <
>> yhemanth@thoughtworks.com> wrote:
>>> Could you please clarify, are you opening the file in your mapper code
>>> and reading from there ?
>>> Thanks
>>> Hemanth
>>> On Friday, February 22, 2013, Lucas Bernardi wrote:
>>>> Hello there, I'm trying to use hadoop map reduce to process an open
>>>> file. The writing process, writes a line to the file and syncs the
>>>> file to readers.
>>>> (org.apache.hadoop.fs.FSDataOutputStream.sync()).
>>>> If I try to read the file from another process, it works fine, at least
>>>> using
>>>> org.apache.hadoop.fs.FSDataInputStream.
>>>> hadoop -fs -tail also works just fine
>>>> But it looks like map reduce doesn't read any data. I tried using the
>>>> word count example, same thing, it is like if the file were empty for the
>>>> map reduce framework.
>>>> I'm using hadoop 1.0.3. and pig 0.10.0
>>>> I need some help around this.
>>>> Thanks!
>>>> Lucas

View raw message