hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7214) Hadoop /usr/bin/groups equivalent
Date Fri, 15 Apr 2011 07:32:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020221#comment-13020221

Aaron T. Myers commented on HADOOP-7214:

Thanks a lot for the comments, Sanjay.

Group resolution used to be done client-side. When this was the case, calling {{/usr/bin/groups}}
was the correct method of determining which groups a user belonged to from HDFS's perspective,
because these were indeed the groups that HDFS used to determine file access. With the introduction
of Hadoop's security features, group resolution was moved server-side, but provided no method
for a user to determine what groups they belong to from Hadoop's perspective. This JIRA seeks
to address this deficiency.

Detailed responses inline.

bq. The main concern I have here is that Hadoop permissions and security was very explicitly
designed to NOT manage user accounts or group accounts. Hadoop uses the accounts (user and
group) from the environment in which Hadoop is deployed. This has many advantages for deploying
and using Hadoop.

I completely agree. With the addition of this command, Hadoop is still not _managing_ the
user/group mapping - it's just reading them, as it was before. I'm certainly not proposing
that we create a command to let a user or admin mutate the user -> group mapping via Hadoop,
or to add a Hadoop-specific user/group database. Just to let a user see what groups they belong
to from Hadoop's perspective.

I see this as being no different than the way {{/usr/bin/groups}} presently works. By default,
the groups on the machine are managed locally via {{/etc/groups}}. Optionally, you can configure
a different back-end database (e.g. LDAP, NIS) to provide the user/group mapping for the machine,
by editing {{/etc/nsswitch.conf}}, in which case the groups are no longer managed locally.
But even if you do this, {{/usr/bin/groups}} will still work.

bq. It seems that one needs a library that uses the same plugin that the NN or JT uses. The
command can call this library.

This won't work for the default case, i.e. when the NN/JT are using the {{ShellBasedUnixGroupsMapping}}
and a user is interacting with the NN/JT from a remote machine which happens to not have the
same user/group mappings as the NN/JT. If the client were to just use this plugin, they would
get the user/group mapping on the client machine, not the ones that matter - the ones on the

bq. Allen's point is correct that one has to correctly configure all Hadoop components to
pull the membership from the same source.

I agree with this point as well. The question is what qualifies as a "correct" configuration.
I claim that having distinct user/group mappings on the client vs. the master servers is a
perfectly valid configuration. Sure, it's a little more difficult to reason about, since a
user will need to concern themselves with two sets of user/group mappings - local and remote.
Part of the reason this is hard to reason about is precisely because a user presently has
no way of determining what groups they belong to from Hadoop's perspective.

bq. Thinking a little bit further, one could argue that since Hadoop derives user accounts
and membership from 
its environment, one should ask the environment about group membership: that is, use the groups
command of 
your environment (typically a unix system).

This won't work for the case I'm concerned about. The user would need to {{ssh}} into the
NN or JT (which we usually recommend disabling shell access to) to determine the groups they
belong to.

bq. Perhaps the main motivation for Hadoop-groups-command is to detect if Hadoop has been
correctly configured 
for the environment.

This is certainly part of the motivation, but by no means the only one. Without a command
like this, if a Hadoop admin were attempting to configure a custom {{GroupMappingServiceProvider}},
how would they determine if it were working?

> Hadoop /usr/bin/groups equivalent
> ---------------------------------
>                 Key: HADOOP-7214
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7214
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 0.23.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: hadoop-7214.0.txt, hadoop-7214.1.txt, hadoop-7214.2.txt, hadoop-7214.3.txt,
> Since user -> groups resolution is done on the NN and JT machines, there should be
a way for users to determine what groups they're a member of from the NN's and JT's perspective.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message