hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: What's the easiest way to count the number of <Key, Value> pairs in a directory?
Date Fri, 20 May 2011 16:57:16 GMT
What format is the input data in?

At first glance, I would run an identity mapper and use a
NullOutputFormat so you don't get any data written. The built in
counters already count the number of key, value pairs read in by the


On Fri, May 20, 2011 at 9:34 AM, W.P. McNeill <billmcn@gmail.com> wrote:
> I've got a directory with a bunch of MapReduce data in it.  I want to know
> how many <Key, Value> pairs it contains.  I could write a mapper-only
> process that takes <Writeable, Writeable> pairs as input and updates a
> counter, but it seems like this utility should already exist.  Does it, or
> do I have to roll my own?
> Bonus question, is there a way to count the number of <Key, Value> pairs
> without deserializing the values?  This can be expensive for the data I'm
> working with.

Joseph Echeverria
Cloudera, Inc.

View raw message