hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma" <jssa...@facebook.com>
Subject RE: Question on running simultaneous jobs
Date Wed, 09 Jan 2008 23:22:55 GMT
> that can run(per job) at any given time.  
not possible afaik - but i will be happy to hear otherwise.
priorities are a good substitute though. there's no point needlessly restricting concurrency
if there's nothing else to run. if there is something else more important to run - then in
most cases, assigning a higher priority to that other thing would make the right thing happen.
except with long running tasks (usually reducers) that cannot be preempted. (Hadoop does not
seem to use OS process priorities at all. I wonder if process priorities can be used as a
substitute for pre-emption.)
HOD is another solution that you might want to look into - my understanding is that with HOD
u can restrict the number of machines used by a job.

From: Xavier Stevens [mailto:Xavier.Stevens@fox.com]
Sent: Wed 1/9/2008 2:57 PM
To: hadoop-user@lucene.apache.org
Subject: RE: Question on running simultaneous jobs

This doesn't work to solve this issue because it sets the total number
of map/reduce tasks. When setting the total number of map tasks I get an
ArrayOutOfBoundsException within Hadoop; I believe because of the input
dataset size (around 90 million lines).

I think it is important to make a distinction between setting total
number of map/reduce tasks and the number that can run(per job) at any
given time.  I would like only to restrict the later, while allowing
Hadoop to divide the data into chunks as it sees fit.

-----Original Message-----
From: Ted Dunning [mailto:tdunning@veoh.com]
Sent: Wednesday, January 09, 2008 1:50 PM
To: hadoop-user@lucene.apache.org
Subject: Re: Question on running simultaneous jobs

You may need to upgrade, but 15.1 does just fine with multiple jobs in
the cluster.  Use conf.setNumMapTasks(int) and

On 1/9/08 11:25 AM, "Xavier Stevens" <Xavier.Stevens@fox.com> wrote:

> Does Hadoop support running simultaneous jobs?  If so, what parameters

> do I need to set in my job configuration?  We basically want to give a

> job that takes a really long time, half of the total resources of the
> cluster so other jobs don't queue up behind it.
> I am using Hadoop 0.14.2 currently.  I tried setting
> mapred.tasktracker.tasks.maximum to be half of the maximum specified
> in mapred-default.xml.  This shows the change in the web
> administration page for the job, but it has no effect on the actual
> numbers of tasks running.
> Thanks,
> Xavier

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message