hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7214) Hadoop /usr/bin/groups equivalent
Date Thu, 07 Apr 2011 06:03:08 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016707#comment-13016707
] 

Allen Wittenauer commented on HADOOP-7214:
------------------------------------------

bq. However, this OS doesn't provide a way for a user to find out what groups they're a member
of from the OS's perspective.

Hadoop doesn't ultimately 'own' those resources.  They come from an external source.  Where
does username to uid mapping click in for running tasks?   All of these mappings sit outside
Hadoop.

bq. But, do you also consider having different users/groups on the client machine vs the NN
to be a misconfiguration? That seems like a perfectly reasonable setup to me, and one that
we should support.

Yes, very much so. It breaks a lot of different services (not just Hadoop) and is confusing
to users. If one insists on this foolishness, there are other ways to solve this problem that
don't involve us. This is a snow flake that could easily turn into an avalanche.

FWIW, we also allow laptops to connect directly to one of our Hadoop grids. So yes, I'm very
familiar and have thought a lot about this particular problem already. That's why we have
other services that allow users to see what groups they belong to, their user id, etc, etc.

Again: why do we need to provide a solution inherent in the software for ultimately is a problem
that is a) much larger than our software and b) can be solved without us doing anything? 
Just because we *can* do something doesn't mean we *should*.

bq. Perhaps, but Hadoop also supports making the user -> group mapping service pluggable
via the {{hadoop.security.group.mapping}} configuration parameter. Why should we require implementers
of this to provide a way of querying this information on their own, through some other mechanism,
rather than have Hadoop show it? When a Hadoop user gets a "permission denied" error from
a Hadoop command, and wants to know what groups Hadoop thinks they belong to, they'll have
to run "{{random-command-x}}" rather than something simple like "{{hadoop fs -groups}}". That
only seems to make Hadoop harder to use.

If someone writes a pluggable module, then this is something that needs to get factored into
the cost of using that plug-in. What happens if those groups aren't in a displayable format?


Also, what happens if they aren't using the command line?  Are we going to write a jsp too?
 This is going to quickly balloon out of control. :( 

bq. Hadoop assumes that file system implementations are capable of associating files and directories
with users and groups, as HDFS does. 

Sort of.  There is no reason why a file system's implementation of users and groups couldn't
be a nop.  (Actually, isn't that the case for S3, Cassandra, and a few other non-POSIX-likes
already?)  What do we display in the case where the group is useless information for the file
system in use? 

bq. My point is just that Hadoop isn't hiding this information as it stands. Hadoop makes
decisions based on the groups a user belongs to, so we should make it easy for our users to
find out what groups Hadoop thinks they belong to.

...except Hadoop is told what groups a user belongs to by an external source. Why shouldn't
it be the responsibility of the external source to share this information?  We're the consumer,
not the provider when it comes to naming services.

bq. Showing the username seems reasonable to me, and in fact the patch I'm working on displays
this. Hadoop doesn't make decisions based on one's UID, so why should we show that?

...

bq. I don't follow this reasoning. Kerberos doesn't have any notion of groups. But, the first
component of the Kerberos principal name is used as the username when the NN and JT determine
a user's groups. I don't see how we need to account for anything differently with or without
Kerberos support enabled.

Look at the bigger picture and not just focus on groups for a second:

Let's say I fire my job off with a principal of user/joe.  But thanks to remapping (HADOOP-6526),
the task actually gets run as username fred with a uid of 50. I access a file on the local
system (or heck, even NFS) that is not readable by fred/50.  Using the same logic of "oh noes
users don't know their groups", we should be reporting this other information too.

This is a slippery slope and I really really don't think we want to go down this road.

(PS, some Kerberos implementations actually do pass group information along...)

> Hadoop /usr/bin/groups equivalent
> ---------------------------------
>
>                 Key: HADOOP-7214
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7214
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 0.23.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: hadoop-7214.0.txt, hadoop-7214.1.txt
>
>
> Since user -> groups resolution is done on the NN and JT machines, there should be
a way for users to determine what groups they're a member of from the NN's and JT's perspective.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message