Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of jssarma@facebook.com
 designates 204.15.23.140 as permitted sender)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C85316.901AEEB2"
Subject: RE: Question on running simultaneous jobs
Date: Wed, 9 Jan 2008 15:22:55 -0800
Message-ID: 
 <FCDAD498A048A6428B2255D5D4587D37015609F0@sf2pmxb01.thefacebook.com>
Thread-Topic: Question on running simultaneous jobs
Thread-Index: AchSUfE3y9YbHFKnTIWWTKXWmVYizgAmezAQAAImwFAABUnYkgAAU5LAAAKl8is=
References: 
 <12C2BCDB3FA74D4E8E48232599861119021FE3A1@fegplmsexmb05.ffe.foxeg.com>
 <C3AA819B.35AB3%tdunning@veoh.com>
 <12C2BCDB3FA74D4E8E48232599861119021FE3A6@fegplmsexmb05.ffe.foxeg.com>
From: "Joydeep Sen Sarma" <jssarma@facebook.com>
To: <hadoop-user@lucene.apache.org>,
	<hadoop-user@lucene.apache.org>

------_=_NextPart_001_01C85316.901AEEB2
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

> that can run(per job) at any given time. =20
=20
not possible afaik - but i will be happy to hear otherwise.
=20
priorities are a good substitute though. there's no point needlessly =
restricting concurrency if there's nothing else to run. if there is =
something else more important to run - then in most cases, assigning a =
higher priority to that other thing would make the right thing happen.
=20
except with long running tasks (usually reducers) that cannot be =
preempted. (Hadoop does not seem to use OS process priorities at all. I =
wonder if process priorities can be used as a substitute for =
pre-emption.)
=20
HOD is another solution that you might want to look into - my =
understanding is that with HOD u can restrict the number of machines =
used by a job.
=20
________________________________

From: Xavier Stevens [mailto:Xavier.Stevens@fox.com]
Sent: Wed 1/9/2008 2:57 PM
To: hadoop-user@lucene.apache.org
Subject: RE: Question on running simultaneous jobs


This doesn't work to solve this issue because it sets the total number
of map/reduce tasks. When setting the total number of map tasks I get an
ArrayOutOfBoundsException within Hadoop; I believe because of the input
dataset size (around 90 million lines).

I think it is important to make a distinction between setting total
number of map/reduce tasks and the number that can run(per job) at any
given time.  I would like only to restrict the later, while allowing
Hadoop to divide the data into chunks as it sees fit.


-----Original Message-----
From: Ted Dunning [mailto:tdunning@veoh.com]
Sent: Wednesday, January 09, 2008 1:50 PM
To: hadoop-user@lucene.apache.org
Subject: Re: Question on running simultaneous jobs


You may need to upgrade, but 15.1 does just fine with multiple jobs in
the cluster.  Use conf.setNumMapTasks(int) and
conf.setNumReduceTasks(int).


On 1/9/08 11:25 AM, "Xavier Stevens" <Xavier.Stevens@fox.com> wrote:

> Does Hadoop support running simultaneous jobs?  If so, what parameters

> do I need to set in my job configuration?  We basically want to give a

> job that takes a really long time, half of the total resources of the
> cluster so other jobs don't queue up behind it.
>
> I am using Hadoop 0.14.2 currently.  I tried setting
> mapred.tasktracker.tasks.maximum to be half of the maximum specified
> in mapred-default.xml.  This shows the change in the web
> administration page for the job, but it has no effect on the actual
> numbers of tasks running.
>
> Thanks,
>
> Xavier
>


------_=_NextPart_001_01C85316.901AEEB2--