hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Patterson <j...@cloudera.com>
Subject Re: How does the framework sort data?
Date Tue, 03 Aug 2010 19:50:36 GMT
On Mon, Aug 2, 2010 at 2:22 PM, Parimi, Nagender <parimi@amazon.com> wrote:
> Hi,
>
>
>
> This is an admittedly naïve question, but I’ve been unable to find a
> comprehensive answer online. I have gone through the tutorial a few times
> (http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html), and my
> question is simple: who or what performs the sort in MapReduce? The tutorial
> above states the following in a few places -
>
>
>
> “The framework sorts the outputs of the maps, which are then input to the
> reduce tasks”
>
>
>
> “The Mapper outputs are sorted and then partitioned per Reducer”
>
>
>
> But this glosses over an important detail - who’s sorting the mappers’
> outputs and how? Sorting huge amounts of data isn’t cheap, hence my
> interest.
>
>
>
> The tutorial mentions that the Reducer performs 3-4 steps - Shuffle, Sort, a
> possible Secondary Sort on values, and lastly Reduce. After which it states
> -
>
>
>
> “The shuffle and sort phases occur simultaneously; while map-outputs are
> being fetched they are merged”
>
>
>
> I’ve been told that mappers sort their outputs and then merge sort is used
> to combine them. So is it some randomly chosen reducers that perform the
> merges? I would love to know more about the details, anyone know? If you
> know of a doc that explains it, I’d appreciate if you could pass it along!

Chapter 6 of Tom White's Hadoop book (Oreilly press) covers this really well.

Paraphrasing from it:

- Each map task writes to a buffer, which periodically is spilled to disk
- the map task output is subdivided into partitions based on which
reducer will process that section of data
- within partition, each partition is sorted by key
- a specified combiner may be run here
- map output may be compressed to save on network overhead
- map output is now sitting on the disk of each map task machine
- each reduce task starts copying the map task output as soon as its
available, they dont all start at the same time generally (copy phase)
- as the copies are accumulated locally, they are merged into larger,
sorted files
- once all map outputs have been copied for the partition, the reduce
task moves into the sort phase --- which merges the map outputs
maintaining sort order
- the reduce phase is invoked for each key in the sorted output

Josh Patterson
Cloudera

>
>
>
> thanks,
>
> Nagender

Mime
View raw message