hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: Estimating number of worker nodes
Date Tue, 14 Feb 2006 05:48:40 GMT
Some of our discussed future work may make this impractical.  The  
number of available workers may become a variable that depends on  
priority, other parallel work etc.

Perhaps it is best to express your requirements in terms of the input  
or the output, for example size of input per job?

On Feb 13, 2006, at 6:57 PM, Michel Tourn wrote:

> Problem:
>
> sometimes an MR application really wants to set an absolute number  
> of tasks:
>  num Reduce Tasks = 1 (so that the result is available in a single  
> file)
> The job submission API makes this case easy.
>
> but at other times an MR application would rather
> set a relative number of tasks:
>  num Map Tasks = 5 * num active worker nodes
>  num Reduce Tasks = 2 * num active worker nodes
>
> Intuitively, it seems to be a good thing to consider
> num-active-worker-nodes as a variable rather than
> as a constant known by the MapRed user.
>
> Reasons:
>  -the cluster may be expanded/shrinked without the MapRed's user  
> being aware
> of it.
>  -the MapRed user may sometimes run tests on a smaller ('personal')  
> MapRed
> cluster.
>
>
> To implement this, some component of the system needs
> to know: "num active worker nodes"
>
> The JobTracker knows num. active nodes (taskTrackers.size())
> but the JobClient does not.
>
>
> So I can see two possible ways to add this functionality:
> (specifying a number of tasks relative to the cluster size)
>
> 1. JobTracker exposes num. active nodes to JobClient
>   (via an extension to the Job"Submission"Protocol)
>    Some client code connects to the JobTracker twice:
>    once two learn num-workers
>    once to submit a MapRed job, using setNumMapTasks( 5 * num- 
> workers )
>
> 2. JobConf is extended to accept relative number tasks.
>    Existing: setNumMapTasks(int n)               "mapred.map.tasks"
>    New :     setNumMapTasksPerTaskTracker(int n)
> "mapred.map.taskspertracker"
>    JobClient must set either value.
>    Then when JobTracker accepts a job, it simply translates if  
> necessary
>     tasks = taskspertracker * num-workers.
>
>
> Do we agree this is useful,
> and which do you think is the best option? (1. or 2.)
>
> Thanks,
> Michel
>
>
>


Mime
View raw message