hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amogh Vasekar <am...@yahoo-inc.com>
Subject RE: Counting no. of keys.
Date Tue, 04 Aug 2009 05:38:12 GMT
Have you had a look at the reporter.counter hadoop provides? I think it might be helpful in
your case, where in you can locally aggregate for each map task and then push it to global

-----Original Message-----
From: Zhong Wang [mailto:wangzhong.neu@gmail.com] 
Sent: Monday, August 03, 2009 6:31 PM
To: common-user@hadoop.apache.org
Subject: Re: Counting no. of keys.

I have the same question, but i want to use map records number in
reduce phase exactly after the map. This is very useful in solving
problems like TF-IDF. In reduce (IDF calculating) phase, you must know
the total number of all documents. Is there any method to solve the
problem without running two Map-Reduce jobs?

On Sun, Aug 2, 2009 at 2:08 PM, Ted Dunning<ted.dunning@gmail.com> wrote:
> Sure.  Write a word count map-reduce program.  The mapper outputs the key
> from the sequence file as the output key and includes a count.  Then you do
> the normal combiner and reducer from a normal word count program.
> On Sat, Aug 1, 2009 at 9:53 PM, prashant ullegaddi <prashullegaddi@gmail.com
>> wrote:
>> Hi,
>> I've say 800 sequence files written using SequenceFileOutputFormat. Is
>> there
>> any way to know
>> no. of unique keys in those sequence files?
>> Thanks,
>> Prashant.
> --
> Ted Dunning, CTO
> DeepDyve

Zhong Wang

View raw message