hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From s29752-hadoop...@yahoo.com
Subject Re: mapreduce does the wrong thing with dfs permissions?
Date Thu, 28 Feb 2008 21:45:28 GMT
Hi Andy,

I can reproduce the problem and I believe it is a bug.  The output directory should be owned
by the user submitting the job, not the task tracker account.  Do you want to file a jira?
 or I can do it.



----- Original Message ----
From: Andy Li <annndy.lee@gmail.com>
To: core-dev@hadoop.apache.org
Sent: Wednesday, February 27, 2008 1:22:26 AM
Subject: Re: mapreduce does the wrong thing with dfs permissions?

I also encountered the same problem when running the MapReduce code as a different user name.

For example, assuming I have installed Hadoop with an account 'hadoop' and I am going to run
program with user account 'test'. I have created an input folder as /user/test/input/ with
user 'test' and the permission is set to 0775.
/user/test/input      <dir>           2008-02-27 01:20
rwxr-xr-x       test  hadoop

When I run the MapReduce code, the output I specified will be set to user
'hadoop' instead of 'test'.
${HADOOP_HOME}/bin/hadoop jar /tmp/test_perm.jar -m 57 -r 3
"/user/test/input/l" "/user/test/output/"

The directory "/user/test/output/" will have the following permission and
/user/test/output    <dir>           2008-02-27 03:53        rwxr-xr-x
hadoop  hadoop

My question will be - Why is the output folder set to the super user
'hadoop' ?
and of course, the MapReduce code cannot access this folder because the
permission does not allow user 'test'
to write to this folder.  So the output folder was created, but the user
account 'test' cannot write anything to this
folder and therefore threw an exception.  See the following for the

I have been looking for solution to solve this, but cannot find an exact
How do I set the default umask to 0775?  I can add the user 'test' to group
'hadoop' so
the user 'test' can have write access to the folder within 'hadoop' group.
In other word, as long as the folder is set to 'rwxrwxr-x', user 'test' can
read/write to the
folder and share the folder with 'hadoop:hadoop'.  Any idea how I can set or
modify the global
default umask for Hadoop?  or do I have to always override the default umask
value in my configuration
or FileSystem?

======= COPY/PASTE STARTS HERE =======
org.apache.hadoop.fs.permission.AccessControlException: Permission denied:
user=test, access=WRITE,
        at org.apache.hadoop.dfs.PermissionChecker.check(
        at org.apache.hadoop.dfs.PermissionChecker.check(
        at org.apache.hadoop.dfs.PermissionChecker.checkPermission(
        at org.apache.hadoop.dfs.FSNamesystem.checkPermission(
        at org.apache.hadoop.dfs.FSNamesystem.checkAncestorAccess(
        at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(
        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:281)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:899)

        at org.apache.hadoop.ipc.Client.call(Client.java:512)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
        at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(
        at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(
        at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:382)
        at org.apache.hadoop.dfs.DistributedFileSystem.create(
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:436)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:336)
        at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:308)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
======= COPY/PASTE ENDS HERE =======

On Tue, Feb 26, 2008 at 9:47 PM, Owen O'Malley <oom@yahoo-inc.com> wrote:

> On Feb 26, 2008, at 3:05 PM, Michael Bieniosek wrote:
> > Ah, that makes sense.
> >
> > I have things set up this way because I can't trust code that gets
> > run on the tasktrackers: we have to prevent the tasktrackers from
> > eg. sending kill signals to the datanodes.  I didn't think about
> > the jobtracker, but I suppose I should equally not trust code that
> > gets run on the jobtracker...
> Just to be clear, no user code is run in the JobTracker or
> TaskTracker. User code is only run in the client and task processes.
> However, it makes a lot of sense to run map/reduce as a different
> user than hdfs to prevent the task processes from having access to
> the raw blocks or datanodes.
> -- Owen

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message