hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Li" <annndy....@gmail.com>
Subject Re: mapreduce does the wrong thing with dfs permissions?
Date Thu, 28 Feb 2008 23:05:31 GMT
Thanks, just go ahead and file a JIRA.  I have no experience how to file a
JIRA. (will read the manual in the future to learn how-to).

I'll keep track of this and apply the patch later on.  Thanks for taking
care of this issue.

On Thu, Feb 28, 2008 at 2:32 PM, <s29752-hadoopdev@yahoo.com> wrote:

> Just have created HADOOP-2915.
>
>
> ----- Original Message ----
> From: "s29752-hadoopdev@yahoo.com" <s29752-hadoopdev@yahoo.com>
> To: core-dev@hadoop.apache.org
> Sent: Thursday, February 28, 2008 1:45:28 PM
> Subject: Re: mapreduce does the wrong thing with dfs permissions?
>
> Hi Andy,
>
> I can reproduce the problem and I believe it is a bug.  The output
> directory should be owned by the user submitting the job, not the task
> tracker account.  Do you want to file a jira?  or I can do it.
>
> Thanks.
>
> Nicholas
>
>
> ----- Original Message ----
> From: Andy Li <annndy.lee@gmail.com>
> To: core-dev@hadoop.apache.org
> Sent: Wednesday, February 27, 2008 1:22:26 AM
> Subject: Re: mapreduce does the wrong thing with dfs permissions?
>
> I also encountered the same problem when running the MapReduce code as a
> different user name.
>
> For example, assuming I have installed Hadoop with an account 'hadoop' and
> I am going to run my
> program with user account 'test'. I have created an input folder as
> /user/test/input/ with user 'test' and the permission is set to 0775.
> /user/test/input      <dir>           2008-02-27 01:20
> rwxr-xr-x       test  hadoop
>
> When I run the MapReduce code, the output I specified will be set to user
> 'hadoop' instead of 'test'.
> ${HADOOP_HOME}/bin/hadoop jar /tmp/test_perm.jar -m 57 -r 3
> "/user/test/input/l" "/user/test/output/"
>
> The directory "/user/test/output/" will have the following permission and
> user:group.
> /user/test/output    <dir>           2008-02-27 03:53        rwxr-xr-x
> hadoop  hadoop
>
> My question will be - Why is the output folder set to the super user
> 'hadoop' ?
> and of course, the MapReduce code cannot access this folder because the
> permission does not allow user 'test'
> to write to this folder.  So the output folder was created, but the user
> account 'test' cannot write anything to this
> folder and therefore threw an exception.  See the following for the
> exception.
>
> I have been looking for solution to solve this, but cannot find an exact
> answer.
> How do I set the default umask to 0775?  I can add the user 'test' to
> group
> 'hadoop' so
> the user 'test' can have write access to the folder within 'hadoop' group.
> In other word, as long as the folder is set to 'rwxrwxr-x', user 'test'
> can
> read/write to the
> folder and share the folder with 'hadoop:hadoop'.  Any idea how I can set
> or
> modify the global
> default umask for Hadoop?  or do I have to always override the default
> umask
> value in my configuration
> or FileSystem?
>
> ======= COPY/PASTE STARTS HERE =======
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.fs.permission.AccessControlException: Permission denied:
> user=test, access=WRITE,
> inode="_task_200802262256_0007_r_000001_1":hadoop:hadoop:rwxr-xr-x
>        at org.apache.hadoop.dfs.PermissionChecker.check(
> PermissionChecker.java:173)
>        at org.apache.hadoop.dfs.PermissionChecker.check(
> PermissionChecker.java:154)
>        at org.apache.hadoop.dfs.PermissionChecker.checkPermission(
> PermissionChecker.java:102)
>        at org.apache.hadoop.dfs.FSNamesystem.checkPermission(
> FSNamesystem.java:4035)
>        at org.apache.hadoop.dfs.FSNamesystem.checkAncestorAccess(
> FSNamesystem.java:4005)
>        at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(
> FSNamesystem.java:963)
>        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java
> :938)
>        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:281)
>        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:899)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:512)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
>        at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(
> RetryInvocationHandler.java:82)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(
> RetryInvocationHandler.java:59)
>        at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
>        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(
> DFSClient.java:1927)
>        at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:382)
>        at org.apache.hadoop.dfs.DistributedFileSystem.create(
> DistributedFileSystem.java:135)
>        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:436)
>        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:336)
>        at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(
> TextOutputFormat.java:116)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:308)
>        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :2089)
> ======= COPY/PASTE ENDS HERE =======
>
>
>
> On Tue, Feb 26, 2008 at 9:47 PM, Owen O'Malley <oom@yahoo-inc.com> wrote:
>
> >
> > On Feb 26, 2008, at 3:05 PM, Michael Bieniosek wrote:
> >
> > > Ah, that makes sense.
> > >
> > > I have things set up this way because I can't trust code that gets
> > > run on the tasktrackers: we have to prevent the tasktrackers from
> > > eg. sending kill signals to the datanodes.  I didn't think about
> > > the jobtracker, but I suppose I should equally not trust code that
> > > gets run on the jobtracker...
> >
> > Just to be clear, no user code is run in the JobTracker or
> > TaskTracker. User code is only run in the client and task processes.
> > However, it makes a lot of sense to run map/reduce as a different
> > user than hdfs to prevent the task processes from having access to
> > the raw blocks or datanodes.
> >
> > -- Owen
> >
>
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message