hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: map reduce and sync
Date Mon, 25 Feb 2013 07:31:49 GMT
Just an aside (I've not tried to look at the original issue yet), but
Scribe has not been maintained (nor has seen a release) in over a year
now -- looking at the commit history. Same case with both Facebook and
Twitter's fork.

On Mon, Feb 25, 2013 at 7:16 AM, Lucas Bernardi <lucejb@gmail.com> wrote:
> Yeah I looked at scribe, looks good but sounds like too much for my problem.
> I'd rather make it work the simple way. Could you pleas post your code, may
> be I'm doing something wrong on the sync side. Maybe a buffer size, block
> size or some other  parameter is different...
>
> Thanks!
> Lucas
>
>
> On Sun, Feb 24, 2013 at 10:31 PM, Hemanth Yamijala
> <yhemanth@thoughtworks.com> wrote:
>>
>> I am using the same version of Hadoop as you.
>>
>> Can you look at something like Scribe, which AFAIK fits the use case you
>> describe.
>>
>> Thanks
>> Hemanth
>>
>>
>> On Sun, Feb 24, 2013 at 3:33 AM, Lucas Bernardi <lucejb@gmail.com> wrote:
>>>
>>> That is exactly what I did, but in my case, it is like if the file were
>>> empty, the job counters say no bytes read.
>>> I'm using hadoop 1.0.3 which version did you try?
>>>
>>> What I'm trying to do is just some basic analyitics on a product search
>>> system. There is a search service, every time a user performs a search, the
>>> search string, and the results are stored in this file, and the file is
>>> sync'ed. I'm actually using pig to do some basic counts, it doesn't work,
>>> like I described, because the file looks empty for the map reduce
>>> components. I thought it was about pig, but I wasn't sure, so I tried a
>>> simple mr job, and used the word count to test the map reduce compoinents
>>> actually see the sync'ed bytes.
>>>
>>> Of course if I close the file, everything works perfectly, but I don't
>>> want to close the file every while, since that means I should create another
>>> one (since no append support), and that would end up with too many tiny
>>> files, something we know is bad for mr performance, and I don't want to add
>>> more parts to this (like a file merging tool). I think unign sync is a clean
>>> solution, since we don't care about writing performance, so I'd rather keep
>>> it like this if I can make it work.
>>>
>>> Any idea besides hadoop version?
>>>
>>> Thanks!
>>>
>>> Lucas
>>>
>>>
>>>
>>> On Sat, Feb 23, 2013 at 11:54 AM, Hemanth Yamijala
>>> <yhemanth@thoughtworks.com> wrote:
>>>>
>>>> Hi Lucas,
>>>>
>>>> I tried something like this but got different results.
>>>>
>>>> I wrote code that opened a file on HDFS, wrote a line and called sync.
>>>> Without closing the file, I ran a wordcount with that file as input. It did
>>>> work fine and was able to count the words that were sync'ed (even though
the
>>>> file length seems to come as 0 like you noted in fs -ls)
>>>>
>>>> So, not sure what's happening in your case. In the MR job, do the job
>>>> counters indicate no bytes were read ?
>>>>
>>>> On a different note though, if you can describe a little more what you
>>>> are trying to accomplish, we could probably work a better solution.
>>>>
>>>> Thanks
>>>> hemanth
>>>>
>>>>
>>>> On Sat, Feb 23, 2013 at 7:15 PM, Lucas Bernardi <lucejb@gmail.com>
>>>> wrote:
>>>>>
>>>>> Helo Hemanth, thanks for answering.
>>>>> The file is open by a separate process not map reduce related at all.
>>>>> You can think of it as a servlet, receiving requests, and writing them
to
>>>>> this file, every time a request is received it is written and
>>>>> org.apache.hadoop.fs.FSDataOutputStream.sync() is invoked.
>>>>>
>>>>> At the same time, I want to run a map reduce job over this file. Simply
>>>>> runing the word count example doesn't seem to work, it is like if the
file
>>>>> were empty.
>>>>>
>>>>> hadoop -fs -tail works just fine, and reading the file using
>>>>> org.apache.hadoop.fs.FSDataInputStream also works ok.
>>>>>
>>>>> Last thing, the web interface doesn't see the contents, and command
>>>>> hadoop -fs -ls says the file is empty.
>>>>>
>>>>> What am I doing wrong?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Lucas
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Feb 23, 2013 at 4:37 AM, Hemanth Yamijala
>>>>> <yhemanth@thoughtworks.com> wrote:
>>>>>>
>>>>>> Could you please clarify, are you opening the file in your mapper
code
>>>>>> and reading from there ?
>>>>>>
>>>>>> Thanks
>>>>>> Hemanth
>>>>>>
>>>>>> On Friday, February 22, 2013, Lucas Bernardi wrote:
>>>>>>>
>>>>>>> Hello there, I'm trying to use hadoop map reduce to process an
open
>>>>>>> file. The writing process, writes a line to the file and syncs
the file to
>>>>>>> readers.
>>>>>>> (org.apache.hadoop.fs.FSDataOutputStream.sync()).
>>>>>>>
>>>>>>> If I try to read the file from another process, it works fine,
at
>>>>>>> least using
>>>>>>> org.apache.hadoop.fs.FSDataInputStream.
>>>>>>>
>>>>>>> hadoop -fs -tail also works just fine
>>>>>>>
>>>>>>> But it looks like map reduce doesn't read any data. I tried using
the
>>>>>>> word count example, same thing, it is like if the file were empty
for the
>>>>>>> map reduce framework.
>>>>>>>
>>>>>>> I'm using hadoop 1.0.3. and pig 0.10.0
>>>>>>>
>>>>>>> I need some help around this.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Lucas
>>>>>
>>>>>
>>>>
>>>
>>
>



--
Harsh J

Mime
View raw message