Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 11759D476 for ; Tue, 9 Oct 2012 23:45:13 +0000 (UTC) Received: (qmail 48469 invoked by uid 500); 9 Oct 2012 23:45:08 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 48278 invoked by uid 500); 9 Oct 2012 23:45:07 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 48271 invoked by uid 99); 9 Oct 2012 23:45:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Oct 2012 23:45:07 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of goldstone1@llnl.gov designates 128.15.143.173 as permitted sender) Received: from [128.15.143.173] (HELO prdiron-3.llnl.gov) (128.15.143.173) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Oct 2012 23:44:58 +0000 X-Attachments: Received: from prdcassnat.llnl.gov (HELO PRDEXHUB-06V.the-lab.llnl.gov) ([128.15.143.243]) by prdiron-3.llnl.gov with ESMTP; 09 Oct 2012 16:44:36 -0700 Received: from PRDEXMBX-06.the-lab.llnl.gov ([169.254.6.182]) by PRDEXHUB-06V.the-lab.llnl.gov ([128.15.143.162]) with mapi id 14.02.0247.003; Tue, 9 Oct 2012 16:44:36 -0700 From: "Goldstone, Robin J." To: "user@hadoop.apache.org" Subject: issue with permissions of mapred.system.dir Thread-Topic: issue with permissions of mapred.system.dir Thread-Index: AQHNpngKIS+VdNj2lUCZPy7bbhGTGA== Date: Tue, 9 Oct 2012 23:44:34 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.3.120616 x-originating-ip: [128.15.244.189] Content-Type: multipart/alternative; boundary="_000_CC9A04F016313goldstone1llnlgov_" MIME-Version: 1.0 --_000_CC9A04F016313goldstone1llnlgov_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable I am bringing up a Hadoop cluster for the first time (but am an experienced= sysadmin with lots of cluster experience) and running into an issue with p= ermissions on mapred.system.dir. It has generally been a chore to figure = out all the various directories that need to be created to get Hadoop worki= ng, some on the local FS, others within HDFS, getting the right ownership a= nd permissions, etc.. I think I am mostly there but can't seem to get past= my current issue with mapred.system.dir. Some general info first: OS: RHEL6 Hadoop version: hadoop-1.0.3-1.x86_64 20 node cluster configured as follows 1 node as primary namenode 1 node as secondary namenode + job tracker 18 nodes as datanode + tasktracker I have HDFS up and running and have the following in mapred-site.xml: mapred.system.dir hdfs://hadoop1/mapred Shared data for JT - this must be in HDFS I have created this directory in HDFS, owner mapred:hadoop, permissions 700= which seems to be the most common recommendation amongst multiple, often c= onflicting articles about how to set up Hadoop. Here is the top level of m= y filesystem: hyperion-hdp4@hdfs:hadoop fs -ls / Found 3 items drwx------ - mapred hadoop 0 2012-10-09 12:58 /mapred drwxrwxrwx - hdfs hadoop 0 2012-10-09 13:00 /tmp drwxr-xr-x - hdfs hadoop 0 2012-10-09 12:51 /user Note, it doesn't seem to really matter what permissions I set on /mapred si= nce when the Jobtracker starts up it changes them to 700. However, when I try to run the hadoop example teragen program as a "regular= " user I am getting this error: hyperion-hdp4@robing:hadoop jar /usr/share/hadoop/hadoop-examples*.jar tera= gen -D dfs.block.size=3D536870912 10000000000 /user/robing/terasort-input Generating 10000000000 using 2 maps with step of 5000000000 12/10/09 16:27:02 INFO mapred.JobClient: Running job: job_201210072045_0003 12/10/09 16:27:03 INFO mapred.JobClient: map 0% reduce 0% 12/10/09 16:27:03 INFO mapred.JobClient: Job complete: job_201210072045_000= 3 12/10/09 16:27:03 INFO mapred.JobClient: Counters: 0 12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization fai= led: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.securi= ty.AccessControlException: Permission denied: user=3Drobing, access=3DEXECU= TE, inode=3D"mapred":mapred:hadoop:rwx------ at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorA= ccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCons= tructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteExcepti= on.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteExcept= ion.java:57) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:3= 251) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:713) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSyste= m.java:182) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:555) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:536) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:443) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:435) at org.apache.hadoop.security.Credentials.writeTokenStorageFile(Credentials= .java:169) at org.apache.hadoop.mapred.JobInProgress.generateAndStoreTokens(JobInProgr= ess.java:3537) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:696) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:4207) at org.apache.hadoop.mapred.FairScheduler$JobInitializer$InitJob.run(FairSc= heduler.java:291) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto= r.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja= va:908) at java.lang.Thread.run(Thread.java:662) This seems to be saying that is trying to write to the HDFS /mapred filesys= tem as me (robing) rather than as mapred, the username under which the jobt= racker and tasktracker run. To verify this is what is happening, I manually changed the permissions on = /mapred from 700 to 755 since it claims to want execute access: hyperion-hdp4@mapred:hadoop fs -chmod 755 /mapred hyperion-hdp4@mapred:hadoop fs -ls / Found 3 items drwxr-xr-x - mapred hadoop 0 2012-10-09 12:58 /mapred drwxrwxrwx - hdfs hadoop 0 2012-10-09 13:00 /tmp drwxr-xr-x - hdfs hadoop 0 2012-10-09 12:51 /user hyperion-hdp4@mapred: Now I try running again and it fails again, this time complaining it wants = write access to /mapred: hyperion-hdp4@robing:hadoop jar /usr/share/hadoop/hadoop-examples*.jar tera= gen -D dfs.block.size=3D536870912 10000000000 /user/robing/terasort-input Generating 10000000000 using 2 maps with step of 5000000000 12/10/09 16:31:29 INFO mapred.JobClient: Running job: job_201210072045_0005 12/10/09 16:31:30 INFO mapred.JobClient: map 0% reduce 0% 12/10/09 16:31:30 INFO mapred.JobClient: Job complete: job_201210072045_000= 5 12/10/09 16:31:30 INFO mapred.JobClient: Counters: 0 12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization fai= led: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.securi= ty.AccessControlException: Permission denied: user=3Drobing, access=3DWRITE= , inode=3D"mapred":mapred:hadoop:rwxr-xr-x So I changed the permissions on /mapred to 777: hyperion-hdp4@mapred:hadoop fs -chmod 777 /mapred hyperion-hdp4@mapred:hadoop fs -ls / Found 3 items drwxrwxrwx - mapred hadoop 0 2012-10-09 12:58 /mapred drwxrwxrwx - hdfs hadoop 0 2012-10-09 13:00 /tmp drwxr-xr-x - hdfs hadoop 0 2012-10-09 12:51 /user hyperion-hdp4@mapred: And then I run again and this time it works. yperion-hdp4@robing:hadoop jar /usr/share/hadoop/hadoop-examples*.jar terag= en -D dfs.block.size=3D536870912 10000000000 /user/robing/terasort-input Generating 10000000000 using 2 maps with step of 5000000000 12/10/09 16:33:02 INFO mapred.JobClient: Running job: job_201210072045_0006 12/10/09 16:33:03 INFO mapred.JobClient: map 0% reduce 0% 12/10/09 16:34:34 INFO mapred.JobClient: map 1% reduce 0% 12/10/09 16:35:52 INFO mapred.JobClient: map 2% reduce 0% etc=85 And indeed I can see that there is stuff written to /mapred under my userid= : # hyperion-hdp4 /root > hadoop fs -ls /mapred Found 2 items drwxrwxrwx - robing hadoop 0 2012-10-09 16:33 /mapred/job_201210= 072045_0006 -rw------- 2 mapred hadoop 4 2012-10-09 12:58 /mapred/jobtracker= .info However,man ally setting the permissions to 777 is not a workable solution = since any time I restart the jobtracker, it is setting the permissions on /= mapred back to 700. hyperion-hdp3 /root > hadoop fs -ls / Found 3 items drwxrwxrwx - mapred hadoop 0 2012-10-09 16:33 /mapred drwxrwxrwx - hdfs hadoop 0 2012-10-09 13:00 /tmp drwxr-xr-x - hdfs hadoop 0 2012-10-09 12:51 /user # hyperion-hdp3 /root > /etc/init.d/hadoop-jobtracker restart Stopping Hadoop jobtracker daemon (hadoop-jobtracker): stopping jobtracker [ OK ] Starting Hadoop jobtracker daemon (hadoop-jobtracker): starting jobtracker,= logging to /var/log/hadoop/mapred/hadoop-mapred-jobtracker-hyperion-hdp3.o= ut [ OK ] # hyperion-hdp3 /root > hadoop fs -ls / Found 3 items drwx------ - mapred hadoop 0 2012-10-09 16:38 /mapred drwxrwxrwx - hdfs hadoop 0 2012-10-09 13:00 /tmp drwxr-xr-x - hdfs hadoop 0 2012-10-09 12:51 /user # hyperion-hdp3 /root > So my questions are: 1. What are the right permissions on mapred.system.dir? 2. If not 700, how do I get the job tracker to stop changing them to 700= ? 3. If 700 is correct, then what am I doing wrong in my attempt to run th= e example teragen program? Thank you in advance. Robin Goldstone, LLNL --_000_CC9A04F016313goldstone1llnlgov_ Content-Type: text/html; charset="Windows-1252" Content-ID: Content-Transfer-Encoding: quoted-printable
I am bringing up a Hadoop cluster for the first time (but am an experi= enced sysadmin with lots of cluster experience) and running into an issue w= ith permissions on mapred.system.dir.   It has generally been a chore = to figure out all the various directories that need to be created to get Hadoop working, some on the local FS, other= s within HDFS, getting the right ownership and permissions, etc..  I t= hink I am mostly there but can't seem to get past my current issue with map= red.system.dir.

Some general info first:
OS: RHEL6
Hadoop version: hadoop-1.0.3-1.x86_64

20 node cluster configured as follows
1 node as primary namenode
1 node as secondary namenode + job tracker
18 nodes as datanode + tasktracker

I have HDFS up and running and have the following in mapred-site.xml:<= /div>
<property>
  <name>mapred.system.dir</name>
  <value>hdfs://hadoop1/mapred</value>
  <description>Shared data for JT - this must be in HDFS<= ;/description>
</property>

I have created this directory in HDFS, owner mapred:hadoop, permission= s 700 which seems to be the most common recommendation amongst multiple, of= ten conflicting articles about how to set up Hadoop.  Here is the top = level of my filesystem:
hyperion-hdp4@hdfs:hadoop fs -ls /
Found 3 items
drwx------   - mapred hadoop          0 = 2012-10-09 12:58 /mapred
drwxrwxrwx   - hdfs   hadoop         &nb= sp;0 2012-10-09 13:00 /tmp
drwxr-xr-x   - hdfs   hadoop         &nb= sp;0 2012-10-09 12:51 /user

Note, it doesn't seem to really matter what permissions I set on /mapr= ed since when the Jobtracker starts up it changes them to 700.  

However, when I try to run the hadoop example teragen program as a &qu= ot;regular" user I am getting this error:
hyperion-hdp4@robing:hadoop jar /usr/share/hadoop/hadoop-examples*.jar= teragen -D dfs.block.size=3D536870912 10000000000 /user/robing/terasort-in= put
Generating 10000000000 using 2 maps with step of 5000000000
12/10/09 16:27:02 INFO mapred.JobClient: Running job: job_201210072045= _0003
12/10/09 16:27:03 INFO mapred.JobClient:  map 0% reduce 0%
12/10/09 16:27:03 INFO mapred.JobClient: Job complete: job_20121007204= 5_0003
12/10/09 16:27:03 INFO mapred.JobClient: Counters: 0
12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initializatio= n failed:
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.s= ecurity.AccessControlException: Permission denied: user=3Drobing, access=3D= EXECUTE, inode=3D"mapred":mapred:hadoop:rwx------
at sun= .reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun= .reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccesso= rImpl.java:39)
at sun= .reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructo= rAccessorImpl.java:27)
at jav= a.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org= .apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.jav= a:95)
at org= .apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.ja= va:57)
at org= .apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3= 251)
at org= .apache.hadoop.hdfs.DFSClient.create(DFSClient.java:713)
at org= .apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java= :182)
at org= .apache.hadoop.fs.FileSystem.create(FileSystem.java:555)
at org= .apache.hadoop.fs.FileSystem.create(FileSystem.java:536)
at org= .apache.hadoop.fs.FileSystem.create(FileSystem.java:443)
at org= .apache.hadoop.fs.FileSystem.create(FileSystem.java:435)
at org= .apache.hadoop.security.Credentials.writeTokenStorageFile(Credentials.java:= 169)
at org= .apache.hadoop.mapred.JobInProgress.generateAndStoreTokens(JobInProgress.ja= va:3537)
at org= .apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:696)
at org= .apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:4207)
at org= .apache.hadoop.mapred.FairScheduler$JobInitializer$InitJob.run(FairSchedule= r.java:291)
at jav= a.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java= :886)
at jav= a.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908= )
at jav= a.lang.Thread.run(Thread.java:662)
<rest of stack trace omitted>

This seems to be saying that is trying to write to the HDFS /mapred fi= lesystem as me (robing) rather than as mapred, the username under which the= jobtracker and tasktracker run.

To verify this is what is happening, I manually changed the permission= s on /mapred from 700 to 755 since it claims to want execute access:
hyperion-hdp4@mapred:hadoop fs -chmod 755 /mapred
hyperion-hdp4@mapred:hadoop fs -ls /
Found 3 items
drwxr-xr-x   - mapred hadoop          0 = 2012-10-09 12:58 /mapred
drwxrwxrwx   - hdfs   hadoop         &nb= sp;0 2012-10-09 13:00 /tmp
drwxr-xr-x   - hdfs   hadoop         &nb= sp;0 2012-10-09 12:51 /user
hyperion-hdp4@mapred:

Now I try running again and it fails again, this time complaining it w= ants write access to /mapred:
hyperion-hdp4@robing:hadoop jar /usr/share/hadoop/hadoop-examples*.jar= teragen -D dfs.block.size=3D536870912 10000000000 /user/robing/terasort-in= put
Generating 10000000000 using 2 maps with step of 5000000000
12/10/09 16:31:29 INFO mapred.JobClient: Running job: job_201210072045= _0005
12/10/09 16:31:30 INFO mapred.JobClient:  map 0% reduce 0%
12/10/09 16:31:30 INFO mapred.JobClient: Job complete: job_20121007204= 5_0005
12/10/09 16:31:30 INFO mapred.JobClient: Counters: 0
12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initializatio= n failed:
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.s= ecurity.AccessControlException: Permission denied: user=3Drobing, access=3D= WRITE, inode=3D"mapred":mapred:hadoop:rwxr-xr-x

So I changed the permissions on /mapred to 777:
hyperion-hdp4@mapred:hadoop fs -chmod 777 /mapred
hyperion-hdp4@mapred:hadoop fs -ls /
Found 3 items
drwxrwxrwx   - mapred hadoop          0 = 2012-10-09 12:58 /mapred
drwxrwxrwx   - hdfs   hadoop         &nb= sp;0 2012-10-09 13:00 /tmp
drwxr-xr-x   - hdfs   hadoop         &nb= sp;0 2012-10-09 12:51 /user
hyperion-hdp4@mapred:

And then I run again and this time it works.
yperion-hdp4@robing:hadoop jar /usr/share/hadoop/hadoop-examples*.jar = teragen -D dfs.block.size=3D536870912 10000000000 /user/robing/terasort-inp= ut
Generating 10000000000 using 2 maps with step of 5000000000
12/10/09 16:33:02 INFO mapred.JobClient: Running job: job_201210072045= _0006
12/10/09 16:33:03 INFO mapred.JobClient:  map 0% reduce 0%
12/10/09 16:34:34 INFO mapred.JobClient:  map 1% reduce 0%
12/10/09 16:35:52 INFO mapred.JobClient:  map 2% reduce 0%
etc=85

And indeed I can see that there is stuff written to /mapred under my u= serid:
# hyperion-hdp4 /root > hadoop fs -ls /mapred
Found 2 items
drwxrwxrwx   - robing hadoop          0 = 2012-10-09 16:33 /mapred/job_201210072045_0006
-rw-------   2 mapred hadoop          4 = 2012-10-09 12:58 /mapred/jobtracker.info

However,man ally setting the permissions to 777 is not a workable solu= tion since any time I restart the jobtracker, it is setting the permissions= on /mapred back to 700.  
 hyperion-hdp3 /root > hadoop fs -ls /
Found 3 items
drwxrwxrwx   - mapred hadoop          0 = 2012-10-09 16:33 /mapred
drwxrwxrwx   - hdfs   hadoop         &nb= sp;0 2012-10-09 13:00 /tmp
drwxr-xr-x   - hdfs   hadoop         &nb= sp;0 2012-10-09 12:51 /user
# hyperion-hdp3 /root > /etc/init.d/hadoop-jobtracker restart
Stopping Hadoop jobtracker daemon (hadoop-jobtracker): stopping jobtra= cker
                    =                      = ;                  [  OK =  ]
Starting Hadoop jobtracker daemon (hadoop-jobtracker): starting jobtra= cker, logging to /var/log/hadoop/mapred/hadoop-mapred-jobtracker-hyperion-h= dp3.out
                    =                      = ;                  [  OK =  ]
# hyperion-hdp3 /root > hadoop fs -ls /
Found 3 items
drwx------   - mapred hadoop          0 = 2012-10-09 16:38 /mapred
drwxrwxrwx   - hdfs   hadoop         &nb= sp;0 2012-10-09 13:00 /tmp
drwxr-xr-x   - hdfs   hadoop         &nb= sp;0 2012-10-09 12:51 /user
# hyperion-hdp3 /root > 

So my questions are:
  1. What are the right permissions on mapred.system.dir?
  2. If not 700= , how do I get the job tracker to stop changing them to 700?
  3. If 700= is correct, then what am I doing wrong in my attempt to run the example te= ragen program?

Thank you in advance.
Robin Goldstone, LLNL

--_000_CC9A04F016313goldstone1llnlgov_--