hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Question on running simultaneous jobs
Date Thu, 10 Jan 2008 01:30:55 GMT

What is the status of Hadoop on Demand?  Is it ready for prime time?


On 1/9/08 4:58 PM, "Aaron Kimball" <ak@cs.washington.edu> wrote:

> I will add to the discussion that the ability to have multiple tasks of
> equal priority all making progress simultaneously is important in
> academic environments. There are a number of undergraduate programs
> which are starting to use Hadoop in code labs for students.
> 
> Multiple students should be able to submit jobs and if one student's
> poorly-written task is grinding up a lot of cycles on a shared cluster,
> other students still need to be able to test their code in the meantime;
> ideally, they would not need to enter a lengthy job queue. ... I'd say
> that this actually applies to development clusters in general, where
> individual task performance is less important than the ability of
> multiple developers to test code concurrently.
> 
> - Aaron
> 
> 
> 
> Joydeep Sen Sarma wrote:
>>> that can run(per job) at any given time.
>>  
>> not possible afaik - but i will be happy to hear otherwise.
>>  
>> priorities are a good substitute though. there's no point needlessly
>> restricting concurrency if there's nothing else to run. if there is something
>> else more important to run - then in most cases, assigning a higher priority
>> to that other thing would make the right thing happen.
>>  
>> except with long running tasks (usually reducers) that cannot be preempted.
>> (Hadoop does not seem to use OS process priorities at all. I wonder if
>> process priorities can be used as a substitute for pre-emption.)
>>  
>> HOD is another solution that you might want to look into - my understanding
>> is that with HOD u can restrict the number of machines used by a job.
>>  
>> ________________________________
>> 
>> From: Xavier Stevens [mailto:Xavier.Stevens@fox.com]
>> Sent: Wed 1/9/2008 2:57 PM
>> To: hadoop-user@lucene.apache.org
>> Subject: RE: Question on running simultaneous jobs
>> 
>> 
>> 
>> This doesn't work to solve this issue because it sets the total number
>> of map/reduce tasks. When setting the total number of map tasks I get an
>> ArrayOutOfBoundsException within Hadoop; I believe because of the input
>> dataset size (around 90 million lines).
>> 
>> I think it is important to make a distinction between setting total
>> number of map/reduce tasks and the number that can run(per job) at any
>> given time.  I would like only to restrict the later, while allowing
>> Hadoop to divide the data into chunks as it sees fit.
>> 
>> 
>> -----Original Message-----
>> From: Ted Dunning [mailto:tdunning@veoh.com]
>> Sent: Wednesday, January 09, 2008 1:50 PM
>> To: hadoop-user@lucene.apache.org
>> Subject: Re: Question on running simultaneous jobs
>> 
>> 
>> You may need to upgrade, but 15.1 does just fine with multiple jobs in
>> the cluster.  Use conf.setNumMapTasks(int) and
>> conf.setNumReduceTasks(int).
>> 
>> 
>> On 1/9/08 11:25 AM, "Xavier Stevens" <Xavier.Stevens@fox.com> wrote:
>> 
>>> Does Hadoop support running simultaneous jobs?  If so, what parameters
>> 
>>> do I need to set in my job configuration?  We basically want to give a
>> 
>>> job that takes a really long time, half of the total resources of the
>>> cluster so other jobs don't queue up behind it.
>>> 
>>> I am using Hadoop 0.14.2 currently.  I tried setting
>>> mapred.tasktracker.tasks.maximum to be half of the maximum specified
>>> in mapred-default.xml.  This shows the change in the web
>>> administration page for the job, but it has no effect on the actual
>>> numbers of tasks running.
>>> 
>>> Thanks,
>>> 
>>> Xavier
>>> 
>> 
>> 
>> 
>> 
>> 
>> 


Mime
View raw message