hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jaehoon ko (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-5946) Last spill of map task is not necessary for final merge
Date Fri, 27 Jun 2014 00:45:26 GMT
jaehoon ko created MAPREDUCE-5946:
-------------------------------------

             Summary: Last spill of map task is not necessary for final merge
                 Key: MAPREDUCE-5946
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5946
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: performance, security
    Affects Versions: 2.4.0
            Reporter: jaehoon ko
            Assignee: jaehoon ko


In map task, merge starts only after the last spill is completely written to disk. This is
not necessary nor efficient because the last spill should to be reloaded soon for merge, probably
immediately because spills are merged in the order of their sizes and the last spill is likely
smallest. OS page cache is not the answer due to its opportunistic nature.

I'm starting to work on it. Please give me your thoughts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message