hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-657) Free temporary space should be modelled better
Date Thu, 16 Nov 2006 17:45:38 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-657?page=comments#action_12450470 ] 
            
Arun C Murthy commented on HADOOP-657:
--------------------------------------

I see the value in Doug's suggestion... for e.g. at some point in the future we might also
put in metrics like CPU load, VM stats etc. and this would let the JobTracker make 'smarter'
decisions about which task to assign to which TaskTrackers i.e. CPU-bound tasks to IO-laden
TTs and vice-versa. 

I do agree that it might be a very futuristic scenario, but the point is to keep the infrastructure
robust when we can...

> Free temporary space should be modelled better
> ----------------------------------------------
>
>                 Key: HADOOP-657
>                 URL: http://issues.apache.org/jira/browse/HADOOP-657
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.7.2
>            Reporter: Owen O'Malley
>         Assigned To: Arun C Murthy
>
> Currently, there is a configurable size that must be free for a task tracker to accept
a new task. However, that isn't a very good model of what the task is likely to take. I'd
like to propose:
> Map tasks:  totalInputSize * conf.getFloat("map.output.growth.factor", 1.0) / numMaps
> Reduce tasks: totalInputSize * 2 * conf.getFloat("map.output.growth.factor", 1.0) / numReduces
> where totalInputSize is the size of all the maps inputs for the given job.
> To start a new task, 
>   newTaskAllocation + (sum over running tasks of (1.0 - done) * allocation) >= 
>        free disk * conf.getFloat("mapred.max.scratch.allocation", 0.90);
> So in English, we will model the expected sizes of tasks and only task tasks that should
leave us a 10% margin. With:
> map.output.growth.factor -- the relative size of the transient data relative to the map
inputs
> mapred.max.scratch.allocation -- the maximum amount of our disk we want to allocate to
tasks.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message