hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-657) Free temporary space should be modelled better
Date Tue, 31 Oct 2006 00:26:16 GMT
Free temporary space should be modelled better

                 Key: HADOOP-657
                 URL: http://issues.apache.org/jira/browse/HADOOP-657
             Project: Hadoop
          Issue Type: Improvement
          Components: mapred
    Affects Versions: 0.7.2
            Reporter: Owen O'Malley
         Assigned To: Owen O'Malley

Currently, there is a configurable size that must be free for a task tracker to accept a new
task. However, that isn't a very good model of what the task is likely to take. I'd like to

Map tasks:  totalInputSize * conf.getFloat("map.output.growth.factor", 1.0) / numMaps
Reduce tasks: totalInputSize * 2 * conf.getFloat("map.output.growth.factor", 1.0) / numReduces

where totalInputSize is the size of all the maps inputs for the given job.

To start a new task, 
  newTaskAllocation + (sum over running tasks of (1.0 - done) * allocation) >= 
       free disk * conf.getFloat("mapred.max.scratch.allocation", 0.90);

So in English, we will model the expected sizes of tasks and only task tasks that should leave
us a 10% margin. With:
map.output.growth.factor -- the relative size of the transient data relative to the map inputs
mapred.max.scratch.allocation -- the maximum amount of our disk we want to allocate to tasks.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message