hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Armstrong <john.armstr...@ccri.com>
Subject Re:Re:Re: one quesiton in the book of "hadoop:definitive guide 2 edition"
Date Wed, 03 Aug 2011 12:02:34 GMT
On Wed, 3 Aug 2011 10:35:51 +0800 (CST), "Daniel,Wu" <hadoop_wu@163.com>
wrote:
> So the key of a group is determined by the first coming record in the
> group,  if we have 3 records in a group
> 1: (1900,35)
> 2:(1900,34)
> 3:(1900,33)
> 
> if (1900,35) comes in as the first row, then the result key will be
> (1900,35), when the second row (1900,34) comes in, it won't the impact
the
> key of the group, meaning it will not overwrite the key (1900,35) to
> (1900,34), correct.

Effectively, yes.  Remember that on the inside it's using the comparator
something like this:

(1900, 35).. do I have that key already? [searches collection of keys
with, say, a BST] no! I'll add it here.
(1900,34).. do I have that key already? [searches again, now getting a
result of 0 when comparing to (1900,35)] yes! [it's not the same key, but
according to the GroupComparator it is!] so I'll add its value to the key's
iterable of values.
etc.

Mime
View raw message