hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elton Tian (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-1603) Add a plugin class for the TaskTracker to determine available slots
Date Sun, 01 May 2011 11:32:03 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027454#comment-13027454
] 

Elton Tian commented on MAPREDUCE-1603:
---------------------------------------

I like the idea, but I don't think we can set the slots with hardware parameters, rather it's
application dependant. For example, you have a Quad core cluster and a Dual core cluster.
Both cluster have same disk and inter connection. When you run a "Grep", if you apply the
same slot numbers on both cluster, I guess the processing times are similar. If you change
you application to "Sort", still using same number of slots, then there could be noticeable
difference. 

So I guess, to get a reasonable slots, we need to actually run the application. Somehow.

> Add a plugin class for the TaskTracker to determine available slots
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1603
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1603
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Steve Loughran
>            Priority: Minor
>
> Currently the #of available map and reduce slots is determined by the configuration.
MAPREDUCE-922 has proposed working things out automatically, but that is going to depend a
lot on the specific tasks -hard to get right for everyone.
> There is a Hadoop cluster near me that would like to use CPU time from other machines
in the room, machines which cannot offer storage, but which will have spare CPU time when
they aren't running code scheduled with a grid scheduler. The nodes could run a TT which would
report a dynamic number of slots, the number depending upon the current grid workload. 
> I propose we add a plugin point here, so that different people can develop plugin classes
that determine the amount of available slots based on workload, RAM, CPU, power budget, thermal
parameters, etc. Lots of space for customisation and improvement. And by having it as a plugin:
people get to integrate with whatever datacentre schedulers they have without Hadoop itself
needing to be altered: the base implementation would be as today: subtract the number of active
map and reduce slots from the configured values, push that out. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message