hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Gummadi" <gr...@yahoo-inc.com>
Subject Re: Spill and Map Output
Date Wed, 22 Dec 2010 21:07:52 GMT
Each map task will generate a single intermediate file (i.e. Map output file). This is obtained
by merging multiple spills, if spills needed to happen.

Index file gives the details of the offset and length for each reducer. Offset is offset in
the map output file where the input data for the particular reducer starts and length is the
size of the data starting from the offset.


On 12/23/10 2:17 AM, "Pedro Costa" <psdc1978@gmail.com> wrote:


1 - I would like to understand how a partition works in the Map
Reduce. I know that the Map Reduce contains the IndexRecord class that
indicates the length of something. Is it the length of a partition or
of a spill?

2 - In large map output, a partition can be a set of spills, or a
spill is simple the same thing as a partition?


View raw message