Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of hemanty@thoughtworks.com
 designates 64.18.0.145 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CALtSBbYvPTG6jykBfy+DveQhrW74iDFuz+JnBjgY_LMFr1haUQ@mail.gmail.com>
References: 
 <CALtSBbaYkmBEcBbtG3yMFFAFNsJ1hCdZeatDLghn3xzBR67Y=A@mail.gmail.com>
	<CAEAKFL8ieWsWShqCa+zJrwhETZpEnP2uBi+-5fRbdVSygch5_w@mail.gmail.com>
	<CALtSBbYvPTG6jykBfy+DveQhrW74iDFuz+JnBjgY_LMFr1haUQ@mail.gmail.com>
Date: Sat, 23 Feb 2013 20:24:15 +0530
Message-ID: 
 <CAEAKFL_xEWW1rKiZmvk_9CxrDWe_L+7=bGU8qjmM7z3ERvoYjQ@mail.gmail.com>
Subject: Re: map reduce and sync
From: Hemanth Yamijala <yhemanth@thoughtworks.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=e89a8fb2024457b5cf04d6657bab

--e89a8fb2024457b5cf04d6657bab
Content-Type: text/plain; charset=ISO-8859-1

Hi Lucas,

I tried something like this but got different results.

I wrote code that opened a file on HDFS, wrote a line and called sync.
Without closing the file, I ran a wordcount with that file as input. It did
work fine and was able to count the words that were sync'ed (even though
the file length seems to come as 0 like you noted in fs -ls)

So, not sure what's happening in your case. In the MR job, do the job
counters indicate no bytes were read ?

On a different note though, if you can describe a little more what you are
trying to accomplish, we could probably work a better solution.

Thanks
hemanth


On Sat, Feb 23, 2013 at 7:15 PM, Lucas Bernardi <lucejb@gmail.com> wrote:

> Helo Hemanth, thanks for answering.
> The file is open by a separate process not map reduce related at all. You
> can think of it as a servlet, receiving requests, and writing them to this
> file, every time a request is received it is written and
> org.apache.hadoop.fs.FSDataOutputStream.sync() is invoked.
>
> At the same time, I want to run a map reduce job over this file. Simply
> runing the word count example doesn't seem to work, it is like if the file
> were empty.
>
> hadoop -fs -tail works just fine, and reading the file using
> org.apache.hadoop.fs.FSDataInputStream also works ok.
>
> Last thing, the web interface doesn't see the contents, and command hadoop
> -fs -ls says the file is empty.
>
> What am I doing wrong?
>
> Thanks!
>
> Lucas
>
>
>
> On Sat, Feb 23, 2013 at 4:37 AM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Could you please clarify, are you opening the file in your mapper code
>> and reading from there ?
>>
>> Thanks
>> Hemanth
>>
>> On Friday, February 22, 2013, Lucas Bernardi wrote:
>>
>>> Hello there, I'm trying to use hadoop map reduce to process an open
>>> file. The writing process, writes a line to the file and syncs the file
>>> to readers.
>>> (org.apache.hadoop.fs.FSDataOutputStream.sync()).
>>>
>>> If I try to read the file from another process, it works fine, at least
>>> using
>>> org.apache.hadoop.fs.FSDataInputStream.
>>>
>>> hadoop -fs -tail also works just fine
>>>
>>> But it looks like map reduce doesn't read any data. I tried using the
>>> word count example, same thing, it is like if the file were empty for the
>>> map reduce framework.
>>>
>>> I'm using hadoop 1.0.3. and pig 0.10.0
>>>
>>> I need some help around this.
>>>
>>> Thanks!
>>>
>>> Lucas
>>>
>>
>

--e89a8fb2024457b5cf04d6657bab
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Lucas,<div><br></div><div style>I tried something like =
this but got different results.</div><div style><br></div><div style>I wrot=
e code that opened a file on HDFS, wrote a line and called sync. Without cl=
osing the file, I ran a wordcount with that file as input. It did work fine=
 and was able to count the words that were sync&#39;ed (even though the fil=
e length seems to come as 0 like you noted in fs -ls)</div>
<div style><br></div><div style>So, not sure what&#39;s happening in your c=
ase. In the MR job, do the job counters indicate no bytes were read ?</div>=
<div style><br></div><div style>On a different note though, if you can desc=
ribe a little more what you are trying to accomplish, we could probably wor=
k a better solution.</div>
<div style><br></div><div style>Thanks</div><div style>hemanth</div></div><=
div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Sat, Feb 23=
, 2013 at 7:15 PM, Lucas Bernardi <span dir=3D"ltr">&lt;<a href=3D"mailto:l=
ucejb@gmail.com" target=3D"_blank">lucejb@gmail.com</a>&gt;</span> wrote:<b=
r>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Helo Hemanth, thanks for answering.<div>The =
file is open by a separate process not map reduce related at all. You can t=
hink of it as a servlet, receiving requests, and writing them to this file,=
 every time a request is received it is written and=A0<span style=3D"color:=
rgb(34,34,34);font-family:arial,sans-serif;font-size:13px">org.apache.hadoo=
p.fs.</span><span style=3D"color:rgb(34,34,34);font-family:arial,sans-serif=
;font-size:13px">FSDataOutputStream.sync() is invoked.</span></div>

<div><span style=3D"color:rgb(34,34,34);font-family:arial,sans-serif;font-s=
ize:13px"><br></span></div><div><font color=3D"#222222" face=3D"arial, sans=
-serif">At the same time, I want to run a map reduce job over this file. Si=
mply runing the word count example doesn&#39;t seem to work, it is like if =
the file were empty.</font></div>

<div><br></div><div><div style=3D"font-size:13px;color:rgb(34,34,34);font-f=
amily:arial,sans-serif">hadoop -fs -tail works just fine, and reading the f=
ile using org.apache.hadoop.fs.FSDataInputStream also works ok.</div></div>

<div><br></div><div>Last thing, the web interface doesn&#39;t see the conte=
nts, and command hadoop -fs -ls says the file is empty.</div><div><br></div=
><div>What am I doing wrong?</div><div><br></div><div>Thanks!</div><span cl=
ass=3D"HOEnZb"><font color=3D"#888888"><div>

<br></div><div>Lucas</div></font></span><div class=3D"HOEnZb"><div class=3D=
"h5"><div><br></div><div><font color=3D"#222222" face=3D"arial, sans-serif"=
><br></font><br><div class=3D"gmail_quote">On Sat, Feb 23, 2013 at 4:37 AM,=
 Hemanth Yamijala <span dir=3D"ltr">&lt;<a href=3D"mailto:yhemanth@thoughtw=
orks.com" target=3D"_blank">yhemanth@thoughtworks.com</a>&gt;</span> wrote:=
<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Could you please clarify, are you opening th=
e file in your mapper code and reading from there ?<div><br></div><div>Than=
ks</div>

<span><font color=3D"#888888"><div>Hemanth<span></span></div></font></span>=
<div><div><div><br>On Friday, February 22, 2013, Lucas Bernardi  wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><span style=3D"color:rgb(34,34,34);font-size=
:13px;font-family:arial,sans-serif">Hello there, I&#39;m trying to use hado=
op map reduce to process an open file. T</span><span style=3D"color:rgb(34,=
34,34);font-size:13px;font-family:arial,sans-serif">he writing process, wri=
tes a line to the file and syncs the file to readers.</span><div style=3D"c=
olor:rgb(34,34,34);font-size:13px;font-family:arial,sans-serif">


(org.apache.hadoop.fs.FSDataOutputStream.sync()).</div><div style=3D"color:=
rgb(34,34,34);font-size:13px;font-family:arial,sans-serif"><br></div><div s=
tyle=3D"color:rgb(34,34,34);font-size:13px;font-family:arial,sans-serif">
If I try to read the file from another process, it works fine, at least usi=
ng=A0</div><div style=3D"color:rgb(34,34,34);font-size:13px;font-family:ari=
al,sans-serif">org.apache.hadoop.fs.FSDataInputStream.</div>
<div style=3D"color:rgb(34,34,34);font-size:13px;font-family:arial,sans-ser=
if"><br></div><div style=3D"color:rgb(34,34,34);font-size:13px;font-family:=
arial,sans-serif">
hadoop -fs -tail also works just fine</div><div style=3D"color:rgb(34,34,34=
);font-size:13px;font-family:arial,sans-serif"><br></div><div style=3D"colo=
r:rgb(34,34,34);font-size:13px;font-family:arial,sans-serif">
But it looks like map reduce doesn&#39;t read any data. I tried using the w=
ord count example, same thing, it is like if the file were empty for the ma=
p reduce framework.</div><div style=3D"color:rgb(34,34,34);font-size:13px;f=
ont-family:arial,sans-serif">


<br></div><div style=3D"color:rgb(34,34,34);font-size:13px;font-family:aria=
l,sans-serif">I&#39;m using hadoop 1.0.3. and pig 0.10.0</div><div style=3D=
"color:rgb(34,34,34);font-size:13px;font-family:arial,sans-serif">
<br></div><div style=3D"color:rgb(34,34,34);font-size:13px;font-family:aria=
l,sans-serif">I need some help around this.</div><div style=3D"color:rgb(34=
,34,34);font-size:13px;font-family:arial,sans-serif">
<br></div><div style=3D"color:rgb(34,34,34);font-size:13px;font-family:aria=
l,sans-serif">Thanks!</div><div style=3D"color:rgb(34,34,34);font-size:13px=
;font-family:arial,sans-serif">
<br></div><div style=3D"color:rgb(34,34,34);font-size:13px;font-family:aria=
l,sans-serif">Lucas</div>
</blockquote></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--e89a8fb2024457b5cf04d6657bab--