hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel,Wu" <hadoop...@163.com>
Subject Re:Re:Re:Re: one quesiton in the book of "hadoop:definitive guide 2 edition"
Date Wed, 03 Aug 2011 03:41:23 GMT
or I should ask, should the input of the reducer for the group of year 1900 be like
key,  value pair
(1900,35), null
(1900,34),null
(1900,33),null


or like
(1900,35), null
(1900,35), null    ==> since (1900,34) is for the same group as (1900,35), so it use (1900,35)
as the key.
(1900,35), null


At 2011-08-03 10:35:51,"Daniel,Wu" <hadoop_wu@163.com> wrote:
>
>So the key of a group is determined by the first coming record in the group,  if we have
3 records in a group
>1: (1900,35)
>2:(1900,34)
>3:(1900,33)
>
>if (1900,35) comes in as the first row, then the result key will be (1900,35), when the
second row (1900,34) comes in, it won't the impact the key of the group, meaning it will not
overwrite the key (1900,35) to (1900,34), correct.
>
>>in the KeyComparator, these are guaranteed to come in reverse order in the >second
slot.  That is, if 35 is the maximum temperature then (1900,35) will >come before ANY other
(1900,t).  Then as the GroupComparator does its >thing, any time (1900,t) comes up it gets
compared AND FOUND EQUAL TO >(1900,35), and thus its (null) value is added to the (1900,35)
group. > >The reducer then gets a (1900,35) key with an Iterable of null values, >which
it pretty much discards and just emits the key, which contains the >maximum value.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message