hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Douglas <cdoug...@apache.org>
Subject Re: How does map-merge work exactly?
Date Mon, 17 Sep 2012 08:21:17 GMT
On Thu, Sep 13, 2012 at 7:04 AM, Martin Dobmeier
<martin.dobmeier@gmail.com> wrote:
> What exactly is a segment? Is it the number of spills?

A segment in this context is a fraction of spill output for a
particular reduce. Each spill contains a segment for every reduce.

> What does "0 segments left" mean? Does it mean that the merge could be
> performed on the first pass?
> Why are only 54 segments merged instead of "io.sort.factor" segments?

The intermediate merge of 54 files to 1 reduces the number of files to
117 - 53 = 64 segments. The final merge is over 64 segments.

> (io.sort.factor determines the number of files to merge during a pass,
> right?)
> Why is the merge performed "number of reducers" times? (I'm counting the
> phrase "Merging 117 segments" exactly 96 times)

Each invocation of the merger is combining all the output assigned to
a reduce by the partitioner. -C

View raw message