hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Dobmeier <martin.dobme...@gmail.com>
Subject Re: How does map-merge work exactly?
Date Tue, 18 Sep 2012 14:02:54 GMT
>> What exactly is a segment? Is it the number of spills?
>A segment in this context is a fraction of spill output for a particular
reduce. Each spill contains a segment for every reduce.

Ah, alright. But why is Hadoop telling me that there are 117 segments given
that only 96 reducers have been configured?
(btw, I'm using Hadoop 1.0.0)

>> Why are only 54 segments merged instead of "io.sort.factor" segments?
(io.sort.factor determines the number of files to merge during a pass,
right?)
> The intermediate merge of 54 files to 1 reduces the number of files to
117 - 53 = 64 segments. The final merge is over 64 segments.

Ok, that makes sense.

>> Why is the merge performed "number of reducers" times? (I'm counting the
> phrase "Merging 117 segments" exactly 96 times)
> Each invocation of the merger is combining all the output assigned to a
reduce by the partitioner.

So the merger is called "number of reducers" times because it combines the
data for a particular reducer which is spread over all spill files, right?

Martin

On Mon, Sep 17, 2012 at 10:21 AM, Chris Douglas <cdouglas@apache.org> wrote:

> On Thu, Sep 13, 2012 at 7:04 AM, Martin Dobmeier
> <martin.dobmeier@gmail.com> wrote:
> > What exactly is a segment? Is it the number of spills?
>
> A segment in this context is a fraction of spill output for a
> particular reduce. Each spill contains a segment for every reduce.
>
> > What does "0 segments left" mean? Does it mean that the merge could be
> > performed on the first pass?
> > Why are only 54 segments merged instead of "io.sort.factor" segments?
>
> The intermediate merge of 54 files to 1 reduces the number of files to
> 117 - 53 = 64 segments. The final merge is over 64 segments.
>
> > (io.sort.factor determines the number of files to merge during a pass,
> > right?)
> > Why is the merge performed "number of reducers" times? (I'm counting the
> > phrase "Merging 117 segments" exactly 96 times)
>
> Each invocation of the merger is combining all the output assigned to
> a reduce by the partitioner. -C
>

Mime
View raw message