hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-7214) Hadoop /usr/bin/groups equivalent
Date Thu, 07 Apr 2011 00:42:06 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016632#comment-13016632

Aaron T. Myers commented on HADOOP-7214:

Allen, I've previously seen you describe Hadoop as "essentially a distributed, networked operating
system." I agree with that assessment. Presently, this OS provides {{chown}}, {{chgrp}}, and
{{chmod}}, all of which presuppose the existence of groups associated with a user. However,
this OS doesn't provide a way for a user to find out what groups they're a member of from
the OS's perspective. I'm surprised you're resistant to adding this functionality. It seems
to me to be a simple deficiency.

Detailed responses to your concerns are inline.

bq. My conclusion is simple: if there is an easy fix for this, go for it. But if we're inventing
a bunch of tool-age to support brokenness, I think it is bad to add the weight long term (yes,
this includes RESTs, RPCs, etc, etc).

You still haven't answered my question: which setup that I described above do you consider
"broken" ?

As I said previously, I can probably agree that having different users/groups on the NN vs
the JT is indeed a misconfiguration, and we shouldn't concern ourselves with that scenario.
But, do you also consider having different users/groups on the client machine vs the NN to
be a misconfiguration? That seems like a perfectly reasonable setup to me, and one that we
should support.

bq. BTW, it is also perfectly reasonable to expect that companies that decide to have split
naming services to provide ways to query that information on their own.

Perhaps, but Hadoop also supports making the user -> group mapping service pluggable via
the {{hadoop.security.group.mapping}} configuration parameter. Why should we require implementers
of this to provide a way of querying this information on their own, through some other mechanism,
rather than have Hadoop show it? When a Hadoop user gets a "permission denied" error from
a Hadoop command, and wants to know what groups Hadoop thinks they belong to, they'll have
to run "{{random-command-x}}" rather than something simple like "{{hadoop fs -groups}}". That
only seems to make Hadoop harder to use.

bq. There could be some potential security issues that we might be circumventing by providing
that information out-of-band.

Hadoop assumes that file system implementations are capable of associating files and directories
with users and groups, as HDFS does. That's already part of the existing Hadoop commands.
A user could presently determine what groups they're a member of by creating a file and then
trying to {{chgrp}} it to different things. The set of inputs for which the {{chgrp}} succeeded
would be the set of groups the user is a member of. Obviously, this isn't feasible for a normal
user to do when they get a {{PermissionDeniedException}}, but it's perfectly reasonable for
an attacker to do.

My point is just that Hadoop isn't hiding this information as it stands. Hadoop makes decisions
based on the groups a user belongs to, so we should make it easy for our users to find out
what groups Hadoop thinks they belong to.

bq. The other thing to keep in mind that going down the path of 'hadoop groups' is too limiting.
If we are going to provide group information, why not also provide uid, username, etc.

Showing the username seems reasonable to me, and in fact the patch I'm working on displays
this. Hadoop doesn't make decisions based on one's UID, so why should we show that?

bq. In the case of a kerberized environment, there is no guarantee that the TGT info matches
what is actually executed on the compute nodes due to remapping...

I don't follow this reasoning. Kerberos doesn't have any notion of groups. But, the first
component of the Kerberos principal name is used as the username when the NN and JT determine
a user's groups. I don't see how we need to account for anything differently with or without
Kerberos support enabled.

> Hadoop /usr/bin/groups equivalent
> ---------------------------------
>                 Key: HADOOP-7214
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7214
>             Project: Hadoop Common
>          Issue Type: New Feature
>    Affects Versions: 0.23.0
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
> Since user -> groups resolution is done on the NN and JT machines, there should be
a way for users to determine what groups they're a member of from the NN's and JT's perspective.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message