hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patai Sangbutsarakum <silvianhad...@gmail.com>
Subject Re: Fair scheduler.
Date Tue, 16 Oct 2012 22:32:33 GMT
Hi Harsh,
Thanks for breaking it down clearly. I would say i am successful 98%
from the instruction.
The 2% is about hadoop.tmp.dir

let's say i have 2 users
userA is a user that start hdfs and mapred
userB is a regular user

if i use default value of  hadoop.tmp.dir
I can submit job as usersA but not by usersB
ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"

i googled around; someone recommended to change hadoop.tmp.dir to /tmp/hadoop.
This way it is almost a yay way; the thing is

if I submit as userA it will create /tmp/hadoop in local machine which
ownership will be userA.userA,
and once I tried to submit job from the same machine as userB I will
get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
Permission denied"
(as because /tmp/hadoop is own by userA.userA). vise versa if I delete
/tmp/hadoop and let the directory be created by userB, userA will not
be able to submit job.

Which is the right approach i should work with?
Please suggest


On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <harsh@cloudera.com> wrote:
> Hi Patai,
> Reply inline.
> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
> <silvianhadoop@gmail.com> wrote:
>> Thanks for input,
>> I am reading the document; i forget to mention that i am on cdh3u4.
> That version should have the support for all of this.
>>> If you point your poolname property to mapred.job.queue.name, then you
>>> can leverage the Per-Queue ACLs
>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>> configure 3 queues of capacity scheduler. in order to have each pool
>> can leverage Per-Queue ACL of each queue.?
> Queues are not hard-tied into CapacityScheduler. You can have generic
> queues in MR. And FairScheduler can bind its Pool concept into the
> Queue configuration.
> All you need to do is the following:
> 1. Map FairScheduler pool name to reuse queue names itself:
> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
> 2. Define your required queues:
> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
> default, foo and bar.
> 3. Define Submit ACLs for each Queue:
> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
> (usernames groupnames)
> mapred.queue.foo.acl-submit-job set to "spam eggs"
> Likewise for remaining queues, as you need it…
> 4. Enable ACLs and restart JT.
> mapred.acls.enabled set to "true"
> 5. Users then use the right API to set queue names before submitting
> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf.html#setQueueName(java.lang.String)
> 6. Done.
> Let us know if this works!
> --
> Harsh J

View raw message