hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Armstrong <john.armstr...@ccri.com>
Subject Re:Re: one quesiton in the book of "hadoop:definitive guide 2 edition"
Date Tue, 02 Aug 2011 13:59:02 GMT
On Tue, 2 Aug 2011 21:49:22 +0800 (CST), "Daniel,Wu" <hadoop_wu@163.com>
wrote:
> we usually use something like values.next()  to loop every rows in a
> specific group, but I didn't see any code to loop the list, at least it
> need to get the first row in the list, which is something like
> values.get().   
> or will NullWritable.get() get the first row in the group?

No; like you said before the value is now in the key.

The grouping comparator receives (1900,35),(1900,34),(1900,34), and so on.
Due to the line

return -IntPair.compare(ip1.getSecond(),ip2.getSecond());

in the KeyComparator, these are guaranteed to come in reverse order in the
second slot.  That is, if 35 is the maximum temperature then (1900,35) will
come before ANY other (1900,t).  Then as the GroupComparator does its
thing, any time (1900,t) comes up it gets compared AND FOUND EQUAL TO
(1900,35), and thus its (null) value is added to the (1900,35) group.

The reducer then gets a (1900,35) key with an Iterable of null values,
which it pretty much discards and just emits the key, which contains the
maximum value.

I admit, it's a pretty subtle trick, and I'm actually glad you brought it
up since I think I may be able to use it to solve a problem I've been
having...

Mime
View raw message