hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Goldstone, Robin J." <goldsto...@llnl.gov>
Subject Re: Fair scheduler.
Date Tue, 16 Oct 2012 22:51:00 GMT
This is similar to issues I ran into with permissions/ownership of
mapred.system.dir when using the fair scheduler.  We are instructed to set
the ownership of mapred.system.dir to mapred:hadoop and then when the job
tracker starts up (running as user mapred) it explicitly sets the
permissions on this directory to 700.  Meanwhile when I go to run a job as
a regular user, it is trying to write stuff into mapred.system.dir but it
can't due to the ownership/permissions that have been established.

Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
it appears from your experience that there are similar issues with
hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
the user's identity rather than as user mapred.  This is good from a
security perspective yet it seems no one bothered to account for this in
terms of the permissions that need to be set in the various directories to
enable this. 

Until this is sorted out by the Hadoop developers, I've put my attempts to
use the fair scheduler on holdŠ

Robin Goldstone, LLNL

On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <silvianhadoop@gmail.com>

>Hi Harsh,
>Thanks for breaking it down clearly. I would say i am successful 98%
>from the instruction.
>The 2% is about hadoop.tmp.dir
>let's say i have 2 users
>userA is a user that start hdfs and mapred
>userB is a regular user
>if i use default value of  hadoop.tmp.dir
>I can submit job as usersA but not by usersB
>ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>i googled around; someone recommended to change hadoop.tmp.dir to
>This way it is almost a yay way; the thing is
>if I submit as userA it will create /tmp/hadoop in local machine which
>ownership will be userA.userA,
>and once I tried to submit job from the same machine as userB I will
>get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>Permission denied"
>(as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>/tmp/hadoop and let the directory be created by userB, userA will not
>be able to submit job.
>Which is the right approach i should work with?
>Please suggest
>On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <harsh@cloudera.com> wrote:
>> Hi Patai,
>> Reply inline.
>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>> <silvianhadoop@gmail.com> wrote:
>>> Thanks for input,
>>> I am reading the document; i forget to mention that i am on cdh3u4.
>> That version should have the support for all of this.
>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>> can leverage the Per-Queue ACLs
>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>> configure 3 queues of capacity scheduler. in order to have each pool
>>> can leverage Per-Queue ACL of each queue.?
>> Queues are not hard-tied into CapacityScheduler. You can have generic
>> queues in MR. And FairScheduler can bind its Pool concept into the
>> Queue configuration.
>> All you need to do is the following:
>> 1. Map FairScheduler pool name to reuse queue names itself:
>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>> 2. Define your required queues:
>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>> default, foo and bar.
>> 3. Define Submit ACLs for each Queue:
>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>> (usernames groupnames)
>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>> Likewise for remaining queues, as you need itŠ
>> 4. Enable ACLs and restart JT.
>> mapred.acls.enabled set to "true"
>> 5. Users then use the right API to set queue names before submitting
>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>> 6. Done.
>> Let us know if this works!
>> --
>> Harsh J

View raw message