Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: 209.85.161.48 is neither permitted nor
 denied by domain of james@tynt.com)
MIME-Version: 1.0
In-Reply-To: <AANLkTikaADa7NZwRNYSzjJhDPhaf4X0WFvZs82wQ2hE=@mail.gmail.com>
References: <4D75F9B7.2090203@gmail.com>
	<AANLkTinOOJWypRNY3n=Bg4hAJZ-uP5sYr7vSUojr1d6X@mail.gmail.com>
	<AANLkTikaADa7NZwRNYSzjJhDPhaf4X0WFvZs82wQ2hE=@mail.gmail.com>
Date: Tue, 8 Mar 2011 05:29:47 -0700
Message-ID: <AANLkTimNC76UnE0Md+M0umNCG1QHg7ZOCLV=iMUk3wWD@mail.gmail.com>
Subject: Re: How to count rows of output files ?
From: James Seigel <james@tynt.com>
To: common-user@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1

Simplest case, if you need a sum of the lines for A,B, and C is to
look at the output that is normally generated which tells you "Reduce
output records".  This can be accessed like the others are telling
you, as a counter, which you could access and explicitly print out or
with your eyes as the summary of the job when it is done.

Cheers
James.

On Tue, Mar 8, 2011 at 3:29 AM, Harsh J <qwertymaniac@gmail.com> wrote:
> I think the previous reply wasn't very accurate. So you need a count
> per-file? One way I can think of doing that, via the job itself, is to
> use Counter to count the "name of the output + the task's ID". But it
> would not be a good solution if there are several hundreds of tasks.
>
> A distributed count can be performed on a single file, however, using
> an identity mapper + null output and then looking at map-input-records
> counter after completion.
>
> On Tue, Mar 8, 2011 at 3:54 PM, Harsh J <qwertymaniac@gmail.com> wrote:
>> Count them as you sink using the Counters functionality of Hadoop
>> Map/Reduce (If you're using MultipleOutputs, it has a way to enable
>> counters for each name used). You can then aggregate related counters
>> post-job, if needed.
>>
>> On Tue, Mar 8, 2011 at 3:11 PM, Jun Young Kim <juneng603@gmail.com> wrote:
>>> Hi.
>>>
>>> my hadoop application generated several output files by a single job.
>>> (for example, A, B, C are generated as a result)
>>>
>>> after finishing a job, I want to count each files' row counts.
>>>
>>> is there any way to count each files?
>>>
>>> thanks.
>>>
>>> --
>>> Junyoung Kim (juneng603@gmail.com)
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>> www.harshj.com
>>
>
>
>
> --
> Harsh J
> www.harshj.com
>