hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Armstrong <john.armstr...@ccri.com>
Subject Re: one quesiton in the book of "hadoop:definitive guide 2 edition"
Date Tue, 02 Aug 2011 13:34:57 GMT
On Tue, 2 Aug 2011 21:25:47 +0800 (CST), "Daniel,Wu" <hadoop_wu@163.com>
> at page 243:
> Per my understanding, The reducer is supposed to output the first value
> (the maximum)  for each year. But I just don't know how it work.
> suppose we have  the data
> 1901  200
> 1901  300
> 1901  400
> Since group is done by the year, so we have only one group,  but we have
> different key as the key is a combination of year and temperature.  for
> reduce,  the output should be  key, list(value) pair,  since we have 3
> so we should output 3 rows,  but since we have only one group, we only
> output 1 rows. So where is the conflict? Where do I misunderstand?

Keep reading the section in the book:

"This still isn't enough to achieve our coal, however.  A partitioner
ensures only that one reducer receives all the records for a year; it
doesn't change the fact that the reducer groups by key within the
partition... The final piece of the puzzle is the setting to control the
grouping.  If we group values in the reducer by the year part of the key,
then we will see all the records for the same year in one reduce group. 
And since they are sorted by temperature in descending order, the first is
the maximum temperature."

That is, in that example they also change the way the reducer groups its

View raw message