hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prashant Sharma <prashant.ii...@gmail.com>
Subject Re: Hadoop - Distributed sorting
Date Sat, 26 Nov 2011 13:44:43 GMT

  You can check out sorting code in examples. Actually you dont need to do
anything for sorting. Map-reduce framework does the
(merge-sort)sorting(which happens during shuffle phase before reducer even
starts.) for you, all you need to do is make column you want to sort on as
your key in map.

So for example you have

 protected void map(LongWritable key, Text value, Context context) throws
IOException, InterruptedException {
        String[] tokenArray = value.toString().split(splitter);
          context.write(new Text(tokenArray[field - 1]), value);

And in the reducer you dont need anything either
    public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
        for (Text val : values) {
            context.write(NullWritable.get(), val);

On Sat, Nov 26, 2011 at 6:33 PM, madhu_sushmi <madhu_sushmi@yahoo.com>wrote:

> Hi,
> I need to implement distributed sorting using Hadoop. I am quite new to
> Hadoop and I am getting confused. If I want to implement Merge sort, what
> my
> Map and reduce should be doing. ? Should all the sorting happen at reduce
> side?
> Please help. This is an urgent requirement. Please guide me.
> Thanks,
> Madhu
> --
> View this message in context:
> http://old.nabble.com/Hadoop---Distributed-sorting-tp32876785p32876785.html
> Sent from the Hadoop core-dev mailing list archive at Nabble.com.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message