hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma" <jssa...@facebook.com>
Subject RE: sort by value
Date Wed, 06 Feb 2008 19:58:39 GMT

> But it actually adds duplicate data (i.e., the value column which
needs 
> sorting) to the key.

Why? U can always take it out of the value to remove the redundancy.

> Also, I wonder what is the benefit to sort values before reaching
> reducers. It can be achieved in the reduce phase anyway.

The reduce only does a merge of sorted segments. The segments have to be
sorted using all the sort fields before the merge itself. Otherwise u
can't do a merge. (hope I understood the question right)


-----Original Message-----
From: Qiong Zhang [mailto:jamesz@yahoo-inc.com] 
Sent: Wednesday, February 06, 2008 11:25 AM
To: core-user@hadoop.apache.org
Subject: sort by value


Hi, All,

Is there a better way to sort by value in the same key before reaching
reducers?

I know it can be achieved by using
setOutputValueGroupingComparator/setOutputKeyComparatorClass.

But it actually adds duplicate data (i.e., the value column which needs
sorting) to the key.

Also, I wonder what is the benefit to sort values before reaching
reducers.
It can be achieved in the reduce phase anyway.

Thanks,
James

Mime
View raw message