hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@yahoo-inc.com>
Subject Re: Multiple Jobs
Date Mon, 13 Sep 2010 22:39:29 GMT
Moving mapreduce-user@, bcc common-user@. Please use the appropriate
project lists for discussions.

----

The default scheduler tries to get all tasks of a single job done  
before moving onto the next job of the same 'priority', however  
whether multiple jobs are run in 'parallel' depends on how much of the  
cluster capacity is taken up by the highest 'priority' job at the head  
of the queue.

So, the behaviour you are seeing is probably the result of a single  
job taking up all of the cluster's capacity in terms of map/reduce  
slots.

The capacity-scheduler, when configured appropriately, will enforce  
capacity constraints so that, for e.g., jobs of single queue cannot  
take up more than the queue's capacity, but you will have similar  
issues among jobs in the same queue. The CS also has user-limits to  
ensure a single user doesn't take up all of the queue's capacity etc.

In your case, you might be able to get away with each of the jobs  
going to different queues.

Also, please be aware that the CS in 0.20.2 is quite dated, you might  
want to use the Yahoo!
GitHub (http://github.com/yahoo/hadoop-common) for the latest version
of the CS.

I'm currently working to get the Yahoo codebase released as an Apache
Release (maybe hadoop-0.20-security), once we get that done you should
be able to use the latest CapacityScheduler via an Apache Release.

The fair-scheduler tries to fair share a cluster among applications/ 
user/pools. Please refer to it's documentation for more information.

Arun

On Sep 13, 2010, at 2:54 PM, Eric Sammer wrote:

> The default scheduler in Hadoop is a FIFO scheduler. You can configure
> either the Fair Scheduler or Capacity scheduler to allow jobs to run  
> in
> parallel and "share" the cluster resources. See
> http://hadoop.apache.org/common/docs/r0.20.2/fair_scheduler.html and
> http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.htmlrespectively 
> .
>
> On Mon, Sep 13, 2010 at 5:36 PM, Rahul Malviya <rmalviya@apple.com>  
> wrote:
>
>> Hi,
>>
>> I am running Pig jobs on Hadoop cluster.
>>
>> I just wanted to know whether I can run multiple jobs on hadoop  
>> cluster
>> simultaneously.
>>
>> Currently when i start two jobs on hadoop they run in a serial  
>> fashion.
>>
>> Is there a way to run N jobs simultaneously on hadoop ?
>>
>> Thanks,
>> Rahul
>
>
>
>
> -- 
> Eric Sammer
> twitter: esammer
> data: www.cloudera.com


Mime
View raw message