hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@yahoo-inc.com>
Subject Re: Why mergeParts() is not parallel with collect() on map?
Date Tue, 03 May 2011 07:52:35 GMT

On May 2, 2011, at 11:30 PM, elton sky wrote:

> In shuffle phase, reduce copies output from map. In parallel, there  
> are
> InMemoryMerger and OnDiskMerger merge copied files if too many. But  
> on map,
> the mergeParts*() *happens only after collect() finished. Why don't we
> parallel spills merging with collect()/sort&spill on map?

Certainly feasible, please feel free to open a jira for the enhancement.

However, typically, the map's merge is much less intensive than the  
reduce's merge. As a result, this might just bloat the code for little  
gain, except in the most extreme cases.


View raw message