hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Query over efficient utilization of cluster using fair scheduling
Date Fri, 15 Jan 2010 16:17:48 GMT
Hi Pallavi,

If you remove userMaxJobsDefault, the default value is Integer.MAX_VALUE -
that is, it's unconstrained by this limit. This means that the other limits
and fair sharing would kick in if multiple jobs are submitted. So, if you
haven't set any of the min-slots, and the jobs are all at the same priority,
they'll share the number of slots equally. Please check out the fair
scheduler documentation in docs/fair_scheduler.pdf in your distro.

-Todd

On Fri, Jan 15, 2010 at 1:15 AM, Pallavi Palleti <
pallavi.palleti@corp.aol.com> wrote:

>  Hi Todd,
>
> Thanks for the reply. I figured out that *userMaxJobsDefault*** was set to
> 1. I have another query regarding the same. What will happen if I remove *userMaxJobsDefault
> *property? What is the default value? Would setting a value higher than 1
> for a particular user leads other users' jobs to stall till these jobs get
> over? If so, is there a way where we can set that, a user can take at max
> some percentage of total idle mappers existing at that time? And, if the
> threshold exceeds, we can let users to run only some defaults number of jobs
> at a time?  This way, we can avoid stalling other users' jobs and also
> efficiently utilize the cluster. Kindly clarify.
>
> Thanks
> Pallavi
>
>
>
> Todd Lipcon wrote:
>
> Hi Pallavi,
>
>  This doesn't sound right. Can you visit
> http://jobtracker:50030/scheduler?advanced and maybe send a screenshot?
> And also upload the allocations.xml file you're using?
>
>  It sounds like you've managed to set either userMaxJobsDefault or
> maxRunningJobs for that user to 1.
>
>  -Todd
>
> On Thu, Jan 14, 2010 at 9:05 PM, Pallavi Palleti <
> pallavi.palleti@corp.aol.com> wrote:
>
>> Hi all,
>>
>> I am experimenting with fair scheduler in a cluster of 10 machines. The
>> users are given default values("0") for minMaps and minReduces in fair
>> scheduler parameters. When I tried to run two jobs using the same username,
>> the fair scheduler is giving 100% fair share to first job(needs 2 mappers)
>> and the second job(needs10 mappers) is in waiting mode though the cluster is
>> totally idle. Allowing these jobs to run simultaneously would take only 10%
>> of total available mappers. However, the second job is not allowed to run
>> till the first job is over. It would be great if some one can suggest some
>> parameter tuning which can allow efficient utilization of cluster. Efficient
>> I mean, allowing jobs to run when the cluster is idle rather letting them in
>> waiting mode. I am not sure whether setting "minMaps, minReduces" for each
>> user would resolve the issue. Kindly clarify.
>>
>> Thanks
>> Pallavi
>>
>
>

Mime
View raw message