hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Smith <csmi...@gmail.com>
Subject Re: Distributed sorting using Hadoop
Date Tue, 29 Nov 2011 15:59:13 GMT
Madhu,

Try working your way through the MapReduce tutorial here:
http://hadoop.apache.org/common/docs/r0.20.205.0/mapred_tutorial.html#Example%3A+WordCount+v1.0
 that covers most of the concepts you require to do a distributed
sort.

Search for the worf, "combiner", in the tutorial to understand about
combining results using the Mapper - to reduce cross cluster traffic.

Also work your way through several of the tutorials and videos on
working with Hadoop - Google is your friend here.

Another good source on the general algoritms is Jimmy Lin's book
referenced on this page:
http://www.umiacs.umd.edu/~jimmylin/book.html

Regards,

Chris

On 26 November 2011 13:05, madhu_sushmi <madhu_sushmi@yahoo.com> wrote:
>
> Hi,
> I need to implement distributed sorting using Hadoop. I am quite new to
> Hadoop and I am getting confused. If I want to implement Merge sort, what my
> Map and reduce should be doing. ? Should all the sorting happen at reduce
> side?
>
> Please help. This is an urgent requirement. Please guide me.
>
> --
> View this message in context: http://old.nabble.com/Distributed-sorting-using-Hadoop-tp32876787p32876787.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>

Mime
View raw message