hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Liu <andyliu1...@gmail.com>
Subject Total number of records processed in mapper
Date Tue, 14 Apr 2009 16:19:31 GMT
Is there a way for all the reducers to have access to the total number of
records that were processed in the Map phase?

For example, I'm trying to perform a simple document frequency calculation.
During the map phase, I emit <word, 1> pairs for every unique word in every
document.  During the reduce phase, I sum the values for each word group.
Then I want to divide that value by the total number of documents.

I suppose I can create a whole separate m/r job whose sole purpose is to
count all the records, then pass that number on.  Is there a more
straighforward way of doing this?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message