hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Question on running simultaneous jobs
Date Thu, 10 Jan 2008 17:50:25 GMT
Aaron Kimball wrote:
> Multiple students should be able to submit jobs and if one student's 
> poorly-written task is grinding up a lot of cycles on a shared cluster, 
> other students still need to be able to test their code in the meantime; 

I think a simple approach to address this is to limit the number of 
tasks from a job that are permitted to execute simultaneously.  If, for 
example, you have a cluster of 50 dual-core nodes, with 100 map task 
slots and 100 reduce task slots, and the configured limit is 25 
simultaneous tasks/job, then four or more jobs will be able to run at a 
time.  This will permit faster jobs to pass slower jobs.  This approach 
also avoids some problems we've seen with HOD, where nodes are 
underutilized during the tail of jobs, and with input locality.

The JobTracker already handles simultaneously executing jobs, so the 
primary change required is just to task allocation, and thus should not 
prove intractable.

I've added a Jira issue for this:

   https://issues.apache.org/jira/browse/HADOOP-2573

Please add further comments there.

Doug

Mime
View raw message