hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mix Nin <pig.mi...@gmail.com>
Subject Re: Number of records in an HDFS file
Date Mon, 13 May 2013 18:16:25 GMT
Ok, let re modify my requirement. I should have specified in the beginning
itself.

I need to get count of records in an HDFS file created by a PIG script and
the store the count in a text file. This should be done automatically on a
daily basis without manual intervention


On Mon, May 13, 2013 at 11:13 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> How about the second approach , get the application/job id which the pig
> creates and submits to cluster and then find the job output counter for
> that job from the JT.
>
> Thanks,
> Rahul
>
>
> On Mon, May 13, 2013 at 11:37 PM, Mix Nin <pig.mixed@gmail.com> wrote:
>
>> It is a text file.
>>
>> If we want to use wc, we need to copy file from HDFS and then use wc, and
>> this may take time. Is there a way without copying file from HDFS to local
>> directory?
>>
>> Thanks
>>
>>
>> On Mon, May 13, 2013 at 11:04 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> few pointers.
>>>
>>> what kind of files are we talking about. for text you can use wc , for
>>> avro data files you can use avro-tools.
>>>
>>> or get the job that pig is generating , get the counters for that job
>>> from the jt of your hadoop cluster.
>>>
>>> Thanks,
>>>  Rahul
>>>
>>>
>>> On Mon, May 13, 2013 at 11:21 PM, Mix Nin <pig.mixed@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> What is the bets way to get the count of records in an HDFS file
>>>> generated by a PIG script.
>>>>
>>>> Thanks
>>>>
>>>>
>>>
>>
>

Mime
View raw message