hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lohit <lohit...@yahoo.com>
Subject Re: Tips on sorting using Hadoop
Date Sat, 20 Sep 2008 18:12:25 GMT
Since this is sorting, does it help if you run map/reduce twice? Number of output bytes should
be same as input bytes.
To do total order sorting, you have to make your partition function split the keyspace equally
in order among the number of reducers. 
For example look at the TeraSort as to how this is done. http://svn.apache.org/repos/asf/hadoop/core/trunk/src/examples/org/apache/hadoop/examples/terasort/TeraSort.java


----- Original Message ----
From: Edward J. Yoon <edwardyoon@apache.org>
To: core-user@hadoop.apache.org
Sent: Saturday, September 20, 2008 10:53:40 AM
Subject: Re: Tips on sorting using Hadoop

I would recommend that run map/reduce twice.


On Sat, Sep 13, 2008 at 5:58 AM, Tenaali Ram <tenaaliram@gmail.com> wrote:
> Hi,
> I want to sort my records ( consisting of string, int, float) using Hadoop.
> One way I have found is to set number of reducers = 1, but this would mean
> all the records go to 1 reducer and it won't be optimized. Can anyone point
> me to some better way to do sorting using Hadoop ?
> Thanks,
> Tenaali

Best regards, Edward J. Yoon

View raw message