hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Dobmeier <martin.dobme...@gmail.com>
Subject How does map-merge work exactly?
Date Thu, 13 Sep 2012 14:04:36 GMT
Hi all,

I'm greatly confused about the spill/sort/merge thing going on during the
Map phase.

Here are some stats:
- io.sort.mb = 256 MB (80% spill threshold)
- io.sort.factor = 64
- spills performed during Map: 117
- number of reducers: 96

Now I'm having real trouble understanding the following log output.

...
mapred.Merger: Merging 117 sorted segments
mapred.Merger: Down to the last merge-pass, with 0 segments left of total
size: 0 bytes
...
mapred.Merger: Merging 117 sorted segments
mapred.Merger: Merging 54 intermediate segments out of a total of 56
mapred.Merger: Down to the last merge-pass, with 3 segments left of total
size: 67119046 bytes
...
mapred.Merger: Merging 117 sorted segments
mapred.Merger: Merging 54 intermediate segments out of a total of 117
mapred.Merger: Down to the last merge-pass, with 64 segments left of total
size: 1609011189 bytes
...

What exactly is a segment? Is it the number of spills?
What does "0 segments left" mean? Does it mean that the merge could be
performed on the first pass?
Why are only 54 segments merged instead of "io.sort.factor" segments?
(io.sort.factor determines the number of files to merge during a pass,
right?)
Why is the merge performed "number of reducers" times? (I'm counting the
phrase "Merging 117 segments" exactly 96 times)

Thanks a lot!
Martin

Mime
View raw message