Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of lucejb@gmail.com designates
 209.85.219.42 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAEAKFL8ieWsWShqCa+zJrwhETZpEnP2uBi+-5fRbdVSygch5_w@mail.gmail.com>
References: 
 <CALtSBbaYkmBEcBbtG3yMFFAFNsJ1hCdZeatDLghn3xzBR67Y=A@mail.gmail.com>
	<CAEAKFL8ieWsWShqCa+zJrwhETZpEnP2uBi+-5fRbdVSygch5_w@mail.gmail.com>
Date: Sat, 23 Feb 2013 10:45:51 -0300
Message-ID: 
 <CALtSBbYvPTG6jykBfy+DveQhrW74iDFuz+JnBjgY_LMFr1haUQ@mail.gmail.com>
Subject: Re: map reduce and sync
From: Lucas Bernardi <lucejb@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=14dae93b57beb0e71d04d6648671

--14dae93b57beb0e71d04d6648671
Content-Type: text/plain; charset=ISO-8859-1

Helo Hemanth, thanks for answering.
The file is open by a separate process not map reduce related at all. You
can think of it as a servlet, receiving requests, and writing them to this
file, every time a request is received it is written and
org.apache.hadoop.fs.FSDataOutputStream.sync() is invoked.

At the same time, I want to run a map reduce job over this file. Simply
runing the word count example doesn't seem to work, it is like if the file
were empty.

hadoop -fs -tail works just fine, and reading the file using
org.apache.hadoop.fs.FSDataInputStream also works ok.

Last thing, the web interface doesn't see the contents, and command hadoop
-fs -ls says the file is empty.

What am I doing wrong?

Thanks!

Lucas


On Sat, Feb 23, 2013 at 4:37 AM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Could you please clarify, are you opening the file in your mapper code and
> reading from there ?
>
> Thanks
> Hemanth
>
> On Friday, February 22, 2013, Lucas Bernardi wrote:
>
>> Hello there, I'm trying to use hadoop map reduce to process an open file.
>> The writing process, writes a line to the file and syncs the file to
>> readers.
>> (org.apache.hadoop.fs.FSDataOutputStream.sync()).
>>
>> If I try to read the file from another process, it works fine, at least
>> using
>> org.apache.hadoop.fs.FSDataInputStream.
>>
>> hadoop -fs -tail also works just fine
>>
>> But it looks like map reduce doesn't read any data. I tried using the
>> word count example, same thing, it is like if the file were empty for the
>> map reduce framework.
>>
>> I'm using hadoop 1.0.3. and pig 0.10.0
>>
>> I need some help around this.
>>
>> Thanks!
>>
>> Lucas
>>
>

--14dae93b57beb0e71d04d6648671
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Helo Hemanth, thanks for answering.<div>The file is open by a separate proc=
ess not map reduce related at all. You can think of it as a servlet, receiv=
ing requests, and writing them to this file, every time a request is receiv=
ed it is written and=A0<span style=3D"color:rgb(34,34,34);font-family:arial=
,sans-serif;font-size:13px">org.apache.hadoop.fs.</span><span style=3D"colo=
r:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px">FSDataOutputSt=
ream.sync() is invoked.</span></div>
<div><span style=3D"color:rgb(34,34,34);font-family:arial,sans-serif;font-s=
ize:13px"><br></span></div><div><font color=3D"#222222" face=3D"arial, sans=
-serif">At the same time, I want to run a map reduce job over this file. Si=
mply runing the word count example doesn&#39;t seem to work, it is like if =
the file were empty.</font></div>
<div><br></div><div><div style=3D"font-size:13px;color:rgb(34,34,34);font-f=
amily:arial,sans-serif">hadoop -fs -tail works just fine, and reading the f=
ile using org.apache.hadoop.fs.FSDataInputStream also works ok.</div></div>
<div><br></div><div>Last thing, the web interface doesn&#39;t see the conte=
nts, and command hadoop -fs -ls says the file is empty.</div><div><br></div=
><div>What am I doing wrong?</div><div><br></div><div>Thanks!</div><div>
<br></div><div>Lucas</div><div><br></div><div><font color=3D"#222222" face=
=3D"arial, sans-serif"><br></font><br><div class=3D"gmail_quote">On Sat, Fe=
b 23, 2013 at 4:37 AM, Hemanth Yamijala <span dir=3D"ltr">&lt;<a href=3D"ma=
ilto:yhemanth@thoughtworks.com" target=3D"_blank">yhemanth@thoughtworks.com=
</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Could you please clarify, are you opening th=
e file in your mapper code and reading from there ?<div><br></div><div>Than=
ks</div>
<span class=3D"HOEnZb"><font color=3D"#888888"><div>Hemanth<span></span></d=
iv></font></span><div class=3D"HOEnZb"><div class=3D"h5"><div><br>On Friday=
, February 22, 2013, Lucas Bernardi  wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><span style=3D"color:rgb(34,34,34);font-size=
:13px;font-family:arial,sans-serif">Hello there, I&#39;m trying to use hado=
op map reduce to process an open file. T</span><span style=3D"color:rgb(34,=
34,34);font-size:13px;font-family:arial,sans-serif">he writing process, wri=
tes a line to the file and syncs the file to readers.</span><div style=3D"c=
olor:rgb(34,34,34);font-size:13px;font-family:arial,sans-serif">


(org.apache.hadoop.fs.FSDataOutputStream.sync()).</div><div style=3D"color:=
rgb(34,34,34);font-size:13px;font-family:arial,sans-serif"><br></div><div s=
tyle=3D"color:rgb(34,34,34);font-size:13px;font-family:arial,sans-serif">
If I try to read the file from another process, it works fine, at least usi=
ng=A0</div><div style=3D"color:rgb(34,34,34);font-size:13px;font-family:ari=
al,sans-serif">org.apache.hadoop.fs.FSDataInputStream.</div>
<div style=3D"color:rgb(34,34,34);font-size:13px;font-family:arial,sans-ser=
if"><br></div><div style=3D"color:rgb(34,34,34);font-size:13px;font-family:=
arial,sans-serif">
hadoop -fs -tail also works just fine</div><div style=3D"color:rgb(34,34,34=
);font-size:13px;font-family:arial,sans-serif"><br></div><div style=3D"colo=
r:rgb(34,34,34);font-size:13px;font-family:arial,sans-serif">
But it looks like map reduce doesn&#39;t read any data. I tried using the w=
ord count example, same thing, it is like if the file were empty for the ma=
p reduce framework.</div><div style=3D"color:rgb(34,34,34);font-size:13px;f=
ont-family:arial,sans-serif">


<br></div><div style=3D"color:rgb(34,34,34);font-size:13px;font-family:aria=
l,sans-serif">I&#39;m using hadoop 1.0.3. and pig 0.10.0</div><div style=3D=
"color:rgb(34,34,34);font-size:13px;font-family:arial,sans-serif">
<br></div><div style=3D"color:rgb(34,34,34);font-size:13px;font-family:aria=
l,sans-serif">I need some help around this.</div><div style=3D"color:rgb(34=
,34,34);font-size:13px;font-family:arial,sans-serif">
<br></div><div style=3D"color:rgb(34,34,34);font-size:13px;font-family:aria=
l,sans-serif">Thanks!</div><div style=3D"color:rgb(34,34,34);font-size:13px=
;font-family:arial,sans-serif">
<br></div><div style=3D"color:rgb(34,34,34);font-size:13px;font-family:aria=
l,sans-serif">Lucas</div>
</blockquote></div>
</div></div></blockquote></div><br></div>

--14dae93b57beb0e71d04d6648671--