hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Boesch <java...@gmail.com>
Subject Re: Collect, Spill and Merge phases insight
Date Tue, 16 Jul 2013 16:31:47 GMT
great questions, i am also looking forward to answers from expert(s) here.

2013/7/16 Felix.徐 <ygnhzeus@gmail.com>

> Hi all,
> I am trying to understand the process of Collect, Spill and Merge in Map,
> I've referred to a few documentations but still have a few questions.
> Here is my understanding about the spill phase in map:
> 1.Collect function add a record into the buffer.
> 2.If the buffer exceeds a threshold (determined by parameters like
> io.sort.mb), spill phase begins.
> 3.Spill phase includes 3 actions : sort , combine and compression.
> 4.Spill may be performed multiple times thus a few spilled files will be
> generated.
> 5.If there are more than 1 spilled files, Merge phase begins and merge
> these files into a big one.
> If there is any miss understanding about these phases, please correct me
> ,thanks!
> And my questions are:
> 1.Where is the partition being calculated (in Collect or Spill) ?  Does
> Collect simply append a record into the buffer and check whether we should
> spill the buffer?
> 2.At Merge phase, since the spilled files are compressed, does it need to
> uncompressed these files and compress them again? Since Merge may be
> performed more than 1 round, does it compress intermediate files?
> 3.Does the Merge phase at Map and Reduce side almost the same (External
> merge-sort combined with Min-Heap) ?

View raw message