hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From elton sky <eltonsky9...@gmail.com>
Subject Why mergeParts() is not parallel with collect() on map?
Date Tue, 03 May 2011 06:30:11 GMT
In shuffle phase, reduce copies output from map. In parallel, there are
InMemoryMerger and OnDiskMerger merge copied files if too many. But on map,
the mergeParts*() *happens only after collect() finished. Why don't we
parallel spills merging with collect()/sort&spill on map?

-Elton

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message