hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4348) Adding service-level authorization to Hadoop
Date Thu, 23 Oct 2008 04:16:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642048#action_12642048

Doug Cutting commented on HADOOP-4348:

Sanjay> The best way to represent that service access is when a service proxy object is
created - e.g when the connection is established.

A proxy is not bound to a single connection.  Connections are retrieved from a cache each
time a call is made. Different proxies may share the same connection, and a single proxy my
use different connections for different calls.

Sanjay> We could share multiple service sessions in a single connection but that complexity
is not worth it.

It would be simpler to implement this way, not more complex.  In HADOOP-4049 it was considerably
simpler to pass extra data by modifying the RPC code than Client/Server.  That's my primary
motivation here: to keep the code simple.  So unless there's a reason why we must authorize
per connection rather than per request, it would be easier to authorize requests and would
better compartmentalize the code.  There are some performance implications.  Authorizing per
request will use fewer connections but perform more authorizations.  I don't know whether
this is significant.  I expect that ACLs will be cached, and that authorization will not be
too expensive, but that remains to be seen.  So performance may provide a motivation to authorize
per connection.  But let's not prematurely optimize.

Sanjay> I see your argument to be equivalent to arguing against service level authorization
and that method level authorization is sufficient.

No, but we will eventually probably need method-level authorization too, and it would be nice
if whatever support we add now also helps then.  If we do this in RPC, then we can examine
only the protocol name for now, and subsequently add method-level authorization at the same
place.  So implementing service-level-authentication this way better prepares us for method-level

Sanjay> Would you be happier if we created an intermediate layer, say rpc-session, in between.
I am not seriously suggesting we do that.

We have two layers today.  We could add this at either layer.  It would be cleaner to add
it only at one layer, not mixed between the two, as in the current patch.  It would be simpler
to add it to the RPC layer, and I have yet to hear a strong reason why that would be wrong.
 That's all I'm saying.

> Adding service-level authorization to Hadoop
> --------------------------------------------
>                 Key: HADOOP-4348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4348
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Arun C Murthy
>             Fix For: 0.20.0
>         Attachments: HADOOP-4348_0_20081022.patch
> Service-level authorization is the initial checking done by a Hadoop service to find
out if a connecting client is a pre-defined user of that service. If not, the connection or
service request will be declined. This feature allows services to limit access to a clearly
defined group of users. For example, service-level authorization allows "world-readable" files
on a HDFS cluster to be readable only by the pre-defined users of that cluster, not by anyone
who can connect to the cluster. It also allows a M/R cluster to define its group of users
so that only those users can submit jobs to it.
> Here is an initial list of requirements I came up with.
>     1. Users of a cluster is defined by a flat list of usernames and groups. A client
is a user of the cluster if and only if her username is listed in the flat list or one of
her groups is explicitly listed in the flat list. Nested groups are not supported.
>     2. The flat list is stored in a conf file and pushed to every cluster node so that
services can access them.
>     3. Services will monitor the modification of the conf file periodically (5 mins interval
by default) and reload the list if needed.
>     4. Checking against the flat list is done as early as possible and before any other
authorization checking. Both HDFS and M/R clusters will implement this feature.
>     5. This feature can be switched off and is off by default.
> I'm aware of interests in pulling user data from LDAP. For this JIRA, I suggest we implement
it using a conf file. Additional data sources may be supported via new JIRA's.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message