hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From phonechen <phonec...@gmail.com>
Subject Re: Pre-sort value list in reduce
Date Tue, 15 Apr 2008 09:40:06 GMT
HI arkady,
I 'm also confuse on how does the hadoop framework do this job:
 transfering  many <key,value> pair of the output in the map() phase to <key
,list of value> before the reduce() phase.
such as Map() output:
 <hello,1>
<hello,1>
<world,1>
<hello,1>
 <world,1>
but the reduce() input is:
<hello,[1,1,1}>
<world,[1,1]>
Can you point me out which class take care of these?
Thanks very much!

Best Regards,

Yours
Phonechen

On 4/15/08, arkady borkovsky <arkady@yahoo-inc.com> wrote:
>
> look at
>  -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner
>
> --ab
>
> On Apr 14, 2008, at 4:25 PM, pi song wrote:
>
> Dear people in Hadoop mailing list,
> >
> > Is there any way to control the value list in reduce (Key, List of
> > values)
> > to be sorted? or at least clusteringly sorted (containing clusters of
> > sorted
> > values e.g. 1,1,1,2,2,2,2,3,3,3,  1,1,1,1,1,1,2,2,2,2,3
> > ,1,1,2,2,2,3,3,3,3,3,3,3) ?
> > I had a look at JobConf.setOutputValueGroupingComparator in javadoc and
> > I
> > think it might be the answer because I feel most of the time grouping in
> > Hadoop is done by sort. Am I right?
> >
> > Can anyone help me? How about the performance impact of your solution?
> >
> > Thanks in advance,
> > Pi
> >
>
>


-- 
--~--~---------~--~----~------------~-------~--

Best Regards,

Yours
Phonechen

-~----------~----~----~----~------~----~------

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message