hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Understanding fair schedulers
Date Wed, 25 Jan 2012 15:21:53 GMT
Not exactly. See, the poolnameproperty being group.name will map the
group name as a pool name. So you need to only use <pool name="ABC">
for configuring a group "ABC". Does that make sense?

On Wed, Jan 25, 2012 at 8:49 PM, praveenesh kumar <praveenesh@gmail.com> wrote:
> Then in that case, will I be using group name tag in allocations file, like
> this inside each pool ?
>
> < group name="ABC">
>    <maxRunningJobs>6</maxRunningJobs>
>  </group>
>
> Thanks,
> Praveenesh
>
> On Wed, Jan 25, 2012 at 8:08 PM, Harsh J <harsh@cloudera.com> wrote:
>
>> A solution would be to place your users into groups, and use
>> group.name identifier to be the  poolnameproperty. Would this work for
>> you instead?
>>
>> On Wed, Jan 25, 2012 at 8:00 PM, praveenesh kumar <praveenesh@gmail.com>
>> wrote:
>> > Also, with the above mentioned method, my problem is I am having one
>> > pool/user (thats obviously not a good way of configuring schedulers)
>> > How can I allocate multiple users to one pool in the xml properties, so
>> > that I don't have to care giving any options inside my codes.
>> >
>> > Thanks,
>> > Praveenesh
>> >
>> > On Wed, Jan 25, 2012 at 7:55 PM, praveenesh kumar <praveenesh@gmail.com
>> >wrote:
>> >
>> >> I am looking for the solution where we can do it permanently without
>> >> specify these things inside jobs.
>> >> I want to keep these things hidden from the end-user.
>> >> End-user would just write pig scripts and all the jobs submitted by the
>> >> particular user will get submit to their respective pools automatically.
>> >>
>> >> What I am doing write now is something like this
>> >>
>> >>  <allocations>
>> >>   <pool name="ABC">
>> >>     <minMaps>10</minMaps>
>> >>     <minReduces>10</minReduces>
>> >>     <maxMaps>192</maxMaps>
>> >>     <maxReduces>96</maxReduces>
>> >>     <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>> >>   </pool>
>> >>   <user name="ABC">
>> >>
>> >>     <maxRunningJobs>6</maxRunningJobs>
>> >>   </user>
>> >>   <userMaxJobsDefault>3</userMaxJobsDefault>
>> >>   <fairSharePreemptionTimeout>600</fairSharePreemptionTimeout>
>> >>
>> >>   <pool name="XYZ">
>> >>     <minMaps>10</minMaps>
>> >>     <minReduces>10</minReduces>
>> >>     <maxMaps>192</maxMaps>
>> >>     <maxReduces>96</maxReduces>
>> >>     <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>> >>   </pool>
>> >>   <user name="XYZ">
>> >>
>> >>    <maxRunningJobs>6</maxRunningJobs>
>> >>   </user>
>> >>   <userMaxJobsDefault>3</userMaxJobsDefault>
>> >>   <fairSharePreemptionTimeout>600</fairSharePreemptionTimeout>
>> >>
>> >> </allocations>
>> >>
>> >> By doing this, I am able to see different pools per user, without
>> >> mentioning anything inside the jobs.
>> >> Automatically jobs are going to the respective pools.
>> >>
>> >> But what I wanted to know , is this the right method to do ?
>> >>
>> >> Thanks,
>> >> Praveenesh
>> >>
>> >>
>> >>
>> >> On Wed, Jan 25, 2012 at 7:36 PM, Harsh J <harsh@cloudera.com> wrote:
>> >>
>> >>> Set the property in Pig with the 'set' command or other ways:
>> >>> http://pig.apache.org/docs/r0.9.1/cmds.html#set or
>> >>> http://pig.apache.org/docs/r0.9.1/start.html#properties
>> >>>
>> >>> As Srinivas covered earlier, pool allocation can be done per-user if
>> >>> you set the scheduler poolnameproperty to "user.name". Per group if
>> >>> you set the property to "group.name".
>> >>>
>> >>> Then you can provide per-poolname config overrides via the "pool"
>> >>> element config described in
>> >>>
>> >>>
>> http://hadoop.apache.org/common/docs/current/fair_scheduler.html#Allocation+File+%28fair-scheduler.xml%29
>> >>>
>> >>> On Wed, Jan 25, 2012 at 7:01 PM, praveenesh kumar <
>> praveenesh@gmail.com>
>> >>> wrote:
>> >>> > I am running pig jobs, how can I specify on which pool, it should
>> run ?
>> >>> > Also do you mean, the pool allocation is done job wise, not user
>> wise ?
>> >>> >
>> >>> >
>> >>> > On Wed, Jan 25, 2012 at 6:14 PM, Srinivas Surasani <vasajb@gmail.com
>> >
>> >>> wrote:
>> >>> >
>> >>> >> Praveenesh,
>> >>> >>
>> >>> >> You can try specifying "mapred.fairscheduler.pool" to your
pool name
>> >>> while
>> >>> >> running the job. By default, mapred.faircheduler.poolnameproperty
>> set
>> >>> to
>> >>> >> user.name ( each job run by user is allocated to his named
pool )
>> and
>> >>> you
>> >>> >> can also change this property to group.name.
>> >>> >>
>> >>> >> Srinivas --
>> >>> >>
>> >>> >> Also, you can set
>> >>> >>
>> >>> >> On Wed, Jan 25, 2012 at 6:24 AM, praveenesh kumar <
>> >>> praveenesh@gmail.com
>> >>> >> >wrote:
>> >>> >>
>> >>> >> > Understanding Fair Schedulers better.
>> >>> >> >
>> >>> >> > Can we create mulitple pools in Fair Schedulers. I guess
Yes.
>> Please
>> >>> >> > correct me.
>> >>> >> >
>> >>> >> > Suppose I have 2 pools in my fair-scheduler.xml
>> >>> >> >
>> >>> >> > 1. Hadoop-users : Min map : 10, Max map : 50, Min Reduce
: 10, Max
>> >>> >> Reduce :
>> >>> >> > 50
>> >>> >> > 2. Admin-users: Min map : 20, Max map : 80, Min Reduce
: 20, Max
>> >>> Reduce :
>> >>> >> > 80
>> >>> >> >
>> >>> >> > I have 5 users, who will be using these pools. How will
I allocate
>> >>> >> specific
>> >>> >> > pools to specific users ?
>> >>> >> >
>> >>> >> > Suppose I want user1,user2 to use "Hadoop-users" pool
and
>> >>> >> user3,user4,user5
>> >>> >> > to use "Admin users"
>> >>> >> >
>> >>> >> > In
>> >>> http://hadoop.apache.org/common/docs/r0.20.205.0/fair_scheduler.html
>> >>> >> > they have mentioned allocations something like this.
>> >>> >> >
>> >>> >> > <?xml version="1.0"?>
>> >>> >> > <allocations>
>> >>> >> >  <pool name="sample_pool">
>> >>> >> >    <minMaps>5</minMaps>
>> >>> >> >    <minReduces>5</minReduces>
>> >>> >> >    <maxMaps>25</maxMaps>
>> >>> >> >    <maxReduces>25</maxReduces>
>> >>> >> >    <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>> >>> >> >  </pool>
>> >>> >> >  <user name="sample_user">
>> >>> >> >    <maxRunningJobs>6</maxRunningJobs>
>> >>> >> >  </user>
>> >>> >> >  <userMaxJobsDefault>3</userMaxJobsDefault>
>> >>> >> >  <fairSharePreemptionTimeout>600</fairSharePreemptionTimeout>
>> >>> >> > </allocations>
>> >>> >> >
>> >>> >> > I tried creating more pools, its happening, but how to
allocate
>> >>> users to
>> >>> >> > use specific pools ?
>> >>> >> >
>> >>> >> > Thanks,
>> >>> >> > Praveenesh
>> >>> >> >
>> >>> >>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Harsh J
>> >>> Customer Ops. Engineer, Cloudera
>> >>>
>> >>
>> >>
>>
>>
>>
>> --
>> Harsh J
>> Customer Ops. Engineer, Cloudera
>>



-- 
Harsh J
Customer Ops. Engineer, Cloudera

Mime
View raw message