hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "pi song" <pi.so...@gmail.com>
Subject Pre-sort value list in reduce
Date Mon, 14 Apr 2008 23:25:29 GMT
Dear people in Hadoop mailing list,

Is there any way to control the value list in reduce (Key, List of values)
to be sorted? or at least clusteringly sorted (containing clusters of sorted
values e.g. 1,1,1,2,2,2,2,3,3,3,  1,1,1,1,1,1,2,2,2,2,3
,1,1,2,2,2,3,3,3,3,3,3,3) ?
I had a look at JobConf.setOutputValueGroupingComparator in javadoc and I
think it might be the answer because I feel most of the time grouping in
Hadoop is done by sort. Am I right?

Can anyone help me? How about the performance impact of your solution?

Thanks in advance,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message