hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-5831) Implement memory-to-memory merge in the reduce
Date Thu, 14 May 2009 07:32:45 GMT
Implement memory-to-memory merge in the reduce

                 Key: HADOOP-5831
                 URL: https://issues.apache.org/jira/browse/HADOOP-5831
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
            Reporter: Arun C Murthy
            Assignee: Arun C Murthy
             Fix For: 0.21.0

HADOOP-3446 fixed the reduce to not flush the in-memory shuffled map-outputs before feeding
to the reduce. However for latency-sensitive applications with lots of memory like the terasort
this hurts performance since the fan-in for the final in-memory merge is too large (all 8000
map-outputs very in-memory) resulting in less than optimal performance.

When I put in an intermediate memory-to-memory merge for the terasort's reduce (there-by avoiding
disk i/o) to cut the fan-in from 8000 to <100 the 'reduce' phase (including the local datanode-write)
sped-up 250% (from 10s to 4s). 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message