Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of pig.mixed@gmail.com
 designates 209.85.219.41 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAO7hTbO5ojXPMPj6YTJrr4Xrw9KXb6fGkJ=h8rD=NTcvc-Xswg@mail.gmail.com>
References: 
 <CAD_3QJLKFZdxvek+gepkG0T4_aJBSka1-n37cMv4my06CunyNA@mail.gmail.com>
	<CAO7hTbNhB0aVAgR9LEb9Ju4xGD5bPLxRJ-RfvKw_L53_ey_BCg@mail.gmail.com>
	<CAD_3QJK4PHVUwtX0V67YHdTVO9GHSHm7Wm3SG2g_BwAxeUgicw@mail.gmail.com>
	<CAO7hTbO5ojXPMPj6YTJrr4Xrw9KXb6fGkJ=h8rD=NTcvc-Xswg@mail.gmail.com>
Date: Mon, 13 May 2013 11:16:25 -0700
Message-ID: 
 <CAD_3QJJrUQMjmF6CVmWhdF4+EOPVZHYd+RRKWm5yGMUkbBDRDw@mail.gmail.com>
Subject: Re: Number of records in an HDFS file
From: Mix Nin <pig.mixed@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=047d7b41c746c203ca04dc9d839d

--047d7b41c746c203ca04dc9d839d
Content-Type: text/plain; charset=ISO-8859-1

Ok, let re modify my requirement. I should have specified in the beginning
itself.

I need to get count of records in an HDFS file created by a PIG script and
the store the count in a text file. This should be done automatically on a
daily basis without manual intervention


On Mon, May 13, 2013 at 11:13 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> How about the second approach , get the application/job id which the pig
> creates and submits to cluster and then find the job output counter for
> that job from the JT.
>
> Thanks,
> Rahul
>
>
> On Mon, May 13, 2013 at 11:37 PM, Mix Nin <pig.mixed@gmail.com> wrote:
>
>> It is a text file.
>>
>> If we want to use wc, we need to copy file from HDFS and then use wc, and
>> this may take time. Is there a way without copying file from HDFS to local
>> directory?
>>
>> Thanks
>>
>>
>> On Mon, May 13, 2013 at 11:04 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> few pointers.
>>>
>>> what kind of files are we talking about. for text you can use wc , for
>>> avro data files you can use avro-tools.
>>>
>>> or get the job that pig is generating , get the counters for that job
>>> from the jt of your hadoop cluster.
>>>
>>> Thanks,
>>>  Rahul
>>>
>>>
>>> On Mon, May 13, 2013 at 11:21 PM, Mix Nin <pig.mixed@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> What is the bets way to get the count of records in an HDFS file
>>>> generated by a PIG script.
>>>>
>>>> Thanks
>>>>
>>>>
>>>
>>
>

--047d7b41c746c203ca04dc9d839d
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Ok, let re modify my requirement. I should have specified =
in the beginning itself.<div><br></div><div style>I need to get count of re=
cords in an HDFS file created by a PIG script and the store the count in a =
text file. This should be done automatically on a daily basis without manua=
l intervention</div>
</div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Mon,=
 May 13, 2013 at 11:13 AM, Rahul Bhattacharjee <span dir=3D"ltr">&lt;<a hre=
f=3D"mailto:rahul.rec.dgp@gmail.com" target=3D"_blank">rahul.rec.dgp@gmail.=
com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_default=
" style=3D"font-family:courier new,monospace">How about the second approach=
 , get the application/job id which the pig creates and submits to cluster =
and then find the job output counter for that job from the JT.<br>


<br></div><div class=3D"gmail_default" style=3D"font-family:courier new,mon=
ospace">Thanks,<br></div><div class=3D"gmail_default" style=3D"font-family:=
courier new,monospace">Rahul<br></div></div><div class=3D"HOEnZb"><div clas=
s=3D"h5">

<div class=3D"gmail_extra">
<br><br><div class=3D"gmail_quote">On Mon, May 13, 2013 at 11:37 PM, Mix Ni=
n <span dir=3D"ltr">&lt;<a href=3D"mailto:pig.mixed@gmail.com" target=3D"_b=
lank">pig.mixed@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gma=
il_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-lef=
t:1ex">


<div dir=3D"ltr">It is a text file.=A0<div><br></div><div>If we want to use=
 wc, we need to copy file from HDFS and then use wc, and this may take time=
. Is there a way without copying file from HDFS to local directory?</div>
<div><br></div><div>Thanks</div></div><div><div><div class=3D"gmail_extra">=
<br><br><div class=3D"gmail_quote">On Mon, May 13, 2013 at 11:04 AM, Rahul =
Bhattacharjee <span dir=3D"ltr">&lt;<a href=3D"mailto:rahul.rec.dgp@gmail.c=
om" target=3D"_blank">rahul.rec.dgp@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail_default=
" style=3D"font-family:courier new,monospace">few pointers.<br><br>what kin=
d of files are we talking about. for text you can use wc , for avro data fi=
les you can use avro-tools.<br>


<br></div><div class=3D"gmail_default" style=3D"font-family:courier new,mon=
ospace">or get the job that pig is generating , get the counters for that j=
ob from the jt of your hadoop cluster.<br><br>Thanks,<br>
</div>
<div class=3D"gmail_default" style=3D"font-family:courier new,monospace">Ra=
hul<br></div></div><div><div><div class=3D"gmail_extra"><br><br><div class=
=3D"gmail_quote">On Mon, May 13, 2013 at 11:21 PM, Mix Nin <span dir=3D"ltr=
">&lt;<a href=3D"mailto:pig.mixed@gmail.com" target=3D"_blank">pig.mixed@gm=
ail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hello,<br><div><br></div><d=
iv>What is the bets way to get the count of records in an HDFS file generat=
ed by a PIG script.</div>


<div><br></div><div>Thanks</div><div><br></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--047d7b41c746c203ca04dc9d839d--