Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5375411416 for ; Mon, 13 May 2013 18:16:58 +0000 (UTC) Received: (qmail 67134 invoked by uid 500); 13 May 2013 18:16:52 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 67029 invoked by uid 500); 13 May 2013 18:16:52 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 67018 invoked by uid 99); 13 May 2013 18:16:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 May 2013 18:16:52 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of pig.mixed@gmail.com designates 209.85.219.41 as permitted sender) Received: from [209.85.219.41] (HELO mail-oa0-f41.google.com) (209.85.219.41) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 May 2013 18:16:45 +0000 Received: by mail-oa0-f41.google.com with SMTP id n9so3689530oag.0 for ; Mon, 13 May 2013 11:16:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=Yzu96CS8S9intw2UGvWFRl7D+Dp1GolzD+ZmNFEme88=; b=ilGVMvGL7cRvtAwG7Xwlo3r/AOEDp8/YgORHPDMdFXRwIIfwHQAW8DjeLi4wA9fYTn nlP+5OOG5MNlJuC9Ro8rl3oVCkTqfnwc4MsXfOEWYTt8xLa3FgHdoh1V8YTNIBtErNw9 tIQMwuYBV3LDqNvAbV6fQceV8XzpcBu8imSpGV+imVaMlUMuOHVnXoGa44/V85l8IA8B hTTZZsDsRhjRDN7kZs/sf0jOJOSVrq+zHH1DYxmCXSNRVpnlI2amXkblDU0Vpe7ri6Tt BdgyVstzcbWRsZltu6sD3Hyp8l1rSpHSkZDZykmfIP/vhfS8xcG8Jpr20qzErOzxvjqJ Ditg== MIME-Version: 1.0 X-Received: by 10.60.141.226 with SMTP id rr2mr13598392oeb.35.1368468985347; Mon, 13 May 2013 11:16:25 -0700 (PDT) Received: by 10.182.81.100 with HTTP; Mon, 13 May 2013 11:16:25 -0700 (PDT) In-Reply-To: References: Date: Mon, 13 May 2013 11:16:25 -0700 Message-ID: Subject: Re: Number of records in an HDFS file From: Mix Nin To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7b41c746c203ca04dc9d839d X-Virus-Checked: Checked by ClamAV on apache.org --047d7b41c746c203ca04dc9d839d Content-Type: text/plain; charset=ISO-8859-1 Ok, let re modify my requirement. I should have specified in the beginning itself. I need to get count of records in an HDFS file created by a PIG script and the store the count in a text file. This should be done automatically on a daily basis without manual intervention On Mon, May 13, 2013 at 11:13 AM, Rahul Bhattacharjee < rahul.rec.dgp@gmail.com> wrote: > How about the second approach , get the application/job id which the pig > creates and submits to cluster and then find the job output counter for > that job from the JT. > > Thanks, > Rahul > > > On Mon, May 13, 2013 at 11:37 PM, Mix Nin wrote: > >> It is a text file. >> >> If we want to use wc, we need to copy file from HDFS and then use wc, and >> this may take time. Is there a way without copying file from HDFS to local >> directory? >> >> Thanks >> >> >> On Mon, May 13, 2013 at 11:04 AM, Rahul Bhattacharjee < >> rahul.rec.dgp@gmail.com> wrote: >> >>> few pointers. >>> >>> what kind of files are we talking about. for text you can use wc , for >>> avro data files you can use avro-tools. >>> >>> or get the job that pig is generating , get the counters for that job >>> from the jt of your hadoop cluster. >>> >>> Thanks, >>> Rahul >>> >>> >>> On Mon, May 13, 2013 at 11:21 PM, Mix Nin wrote: >>> >>>> Hello, >>>> >>>> What is the bets way to get the count of records in an HDFS file >>>> generated by a PIG script. >>>> >>>> Thanks >>>> >>>> >>> >> > --047d7b41c746c203ca04dc9d839d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Ok, let re modify my requirement. I should have specified = in the beginning itself.

I need to get count of re= cords in an HDFS file created by a PIG script and the store the count in a = text file. This should be done automatically on a daily basis without manua= l intervention


On Mon,= May 13, 2013 at 11:13 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.= com> wrote:
How about the second approach= , get the application/job id which the pig creates and submits to cluster = and then find the job output counter for that job from the JT.

Thanks,
Rahul


On Mon, May 13, 2013 at 11:37 PM, Mix Ni= n <pig.mixed@gmail.com> wrote:
It is a text file.=A0

If we want to use= wc, we need to copy file from HDFS and then use wc, and this may take time= . Is there a way without copying file from HDFS to local directory?

Thanks
=

On Mon, May 13, 2013 at 11:04 AM, Rahul = Bhattacharjee <rahul.rec.dgp@gmail.com> wrote:
few pointers.

what kin= d of files are we talking about. for text you can use wc , for avro data fi= les you can use avro-tools.

or get the job that pig is generating , get the counters for that j= ob from the jt of your hadoop cluster.

Thanks,
Ra= hul


On Mon, May 13, 2013 at 11:21 PM, Mix Nin <pig.mixed@gm= ail.com> wrote:
Hello,

What is the bets way to get the count of records in an HDFS file generat= ed by a PIG script.

Thanks





--047d7b41c746c203ca04dc9d839d--