hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: MapReduce shuffle algorithm
Date Wed, 22 May 2013 14:34:46 GMT
Thanks!  I will read the elephant book more thoroughly.

From: Bertrand Dechoux [mailto:dechouxb@gmail.com]
Sent: Tuesday, May 21, 2013 1:22 PM
To: user@hadoop.apache.org
Subject: Re: MapReduce shuffle algorithm

An introduction to the subject can be found in the best known reference :

Hadoop: The Definitive Guide, 3rd Edition
Storage and Analysis at Internet Scale
By Tom White<http://shop.oreilly.com/product/0636920021773.do#tab_04>
Publisher: O'Reilly Media / Yahoo Press
Released: May 2012
Chapter 6 How MapReduce Works -> Shuffle and Sort -> around page 208

After reading this, you should have a good understanding of the architecture and know that
indeed there is no "shuffle phase replication factor" (cf your question on another thread).
For the technical details, the code is probably the next step.

On Tue, May 21, 2013 at 6:58 PM, John Lilley <john.lilley@redpoint.net<mailto:john.lilley@redpoint.net>>
I am very interested in a deep understanding of the MapReduce "Shuffle" phase algorithm and
implementation.  Are there whitepapers I could read for an explanation?  Or another mailing
list for this question?  Obviously there is the code ;-)

View raw message