hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken <kengoodh...@gmail.com>
Subject Re: some questions about 'Hadoop:The.Definitive.Guide.'
Date Fri, 10 Sep 2010 02:53:55 GMT
Unfortunately, I don't have the hadoop book memorized, but usually this is accomplished with
a secondary sort. Using the key field based comparator and partitioner is one way to accomplish
this. You partition on part of the key, in this case the ID, and the sort on the rest of the
key in a way that guarantees a particular record always comes first. 

Sent from my iPad

On Sep 8, 2010, at 11:32 AM, leibnitz <se3g2011@gmail.com> wrote:

> 
> hi,all:
> when i study at chapter 8 of that book,i can't understand some sentences
> which i have tried to find it's explanations in javadoc.they are:
> a.Reduce-side joins,at page 236,it said:
> "The reducer knows that it will receive the station record first, so it
> extracts its name
> from the value and writes it out as a part of every output record (Example
> 8-14)."
> why does the station records will been  received at first always?
> 
> b.example 8-15,on page 237,a frag of codes:
> conf.setOutputValueGroupingComparator(TextPair.FirstComparator.class);
> i know it means that if the keys are equal,then they will be grouped by
> frist key of Pair.but against the output,
> 011990-99999 SIHCCAJAVRI 0067011990999991950051507004+68750...
> 011990-99999 SIHCCAJAVRI 0043011990999991950051512004+68750...
> the fist key of pair is '011990-99999',but why they are duplicated in
> output?
> 
> thanks in advance!
> 
> 
> -- 
> View this message in context: http://lucene.472066.n3.nabble.com/some-questions-about-Hadoop-The-Definitive-Guide-tp1441455p1441455.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Mime
View raw message