hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kan Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4343) Adding user and service-to-service authentication to Hadoop
Date Wed, 04 Mar 2009 19:53:56 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678881#action_12678881

Kan Zhang commented on HADOOP-4343:

Here is the authentication design I plan to implement. 

For all Hadoop services except NN, we simply use Kerberos. For NN, we complement Kerberos
with a second mechanism called [DIGEST-MD5|http://www.ietf.org/rfc/rfc2831.txt] (available
from Java SASL library). A client can authenticate to NN in 2 ways. 
* *Kerberos only* For example, a user accessing HDFS using Hadoop fs commands may use this
* *Kerberos + DIGEST-MD5*  In this case, Kerberos is used for the initial authentication and
setting up a secure connection between a client and NN. After that, the client can obtain
a secret key from the server over the secure connection. This secret key is known only to
the client and NN, and can be used by the client to authenticate to NN on subsequent accesses.
Authentication using the secret key is done using the DIGEST-MD5 protocol, which doesn't involve
any third party, such as Kerberos KDC (key distribution center). The client can also delegate
the secret key to others, so that they may use the key to authenticate to NN as the client.
This is useful in the cases where a M/R job needs to access NN as the job owner. Hereinafter,
we refer to the secret key as *delegation token*. The reasons for introducing delegation token
(and associated DIGEST-MD5 mechanism) are as follows.
** *Performance* On a Map/Reduce cluster, there can be thousands of Tasks running at the same
time. If they use Kerberos to authenticate to a NN, they need either a delegated TGT (ticket
granting ticket) or a delegated service ticket. If using delegated TGT, the Kerberos KDC could
become a bottleneck, since each task needs to get a Kerberos service ticket from the KDC using
the delegated TGT. Using delegation tokens will save those network traffic to the KDC. Another
option is to use a delegated service ticket. Delegated service tickets can be used in a similar
fashion as delegation tokens, i.e., without the need to contact an online third party like
the KDC. However, Java GSS-API doesn't support service ticket delegation. We may need to use
a 3rd party (native) Kerberos library, which requires significantly more development efforts
and makes code less portable.
** *Credential renewal* For Tasks to use Kerberos, the Task owner's Kerberos TGT or service
ticket needs to be delegated and made available to the Tasks. Both TGT and service ticket
can be renewed for long-running jobs (up to max lifetime set at initial issuing). However,
during Kerberos renewal, a new TGT or service ticket will be issued, which needs to be distributed
to all running Tasks. If using delegation tokens, the renewal mechanism can be designed in
such a way that only the validity period of a token is extended on the NN, but the token itself
stays the same. Hence, no new tokens need to be issued and pushed to running Tasks. Moreover,
renewing Kerberos tickets has to be done before current validity period expires, which puts
a timing constraint on the renewal operation. Our delegation tokens can be renewed (or revived)
after current validity period expires (but within the max lifetime) by the designated renewer.
Being able to renew an expired delegation token is not considered a big risk since (unlike
Kerberos) only the designated renewer can renew a token. A stolen token can't be renewed by
the attacker. 
** *Less damage when credential is compromised* A user's Kerberos TGT may be used to access
services other than HDFS. If a delegated TGT is used and compromised, the damage is greater
than using an HDFS-only credential (delegation token). On the other hand, using a delegated
service ticket is equivalent to using a delegation token.

> Adding user and service-to-service authentication to Hadoop
> -----------------------------------------------------------
>                 Key: HADOOP-4343
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4343
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
> Currently, Hadoop services do not authenticate users or other services. As a result,
Hadoop is subject to the following security risks.
> 1. A user can access an HDFS or M/R cluster as any other user. This makes it impossible
to enforce access control in an uncooperative environment. For example, file permission checking
on HDFS can be easily circumvented.
> 2. An attacker can masquerade as Hadoop services. For example, user code running on a
M/R cluster can register itself as a new TaskTracker.
> This JIRA is intended to be a tracking JIRA, where we discuss requirements, agree on
a general approach and identify subtasks. Detailed design and implementation are the subject
of those subtasks.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message