accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
Date Wed, 04 Feb 2015 20:00:36 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305797#comment-14305797
] 

Josh Elser edited comment on ACCUMULO-3513 at 2/4/15 8:00 PM:
--------------------------------------------------------------

I haven't really read up about DIGEST-MD5. I'll have to look into that and see if there's
anything better we can use with SASL.

bq. The individual MapReduce nodes do not have Kerberos principals at all? How do they authenticate
to the job controller?

YARN processes have kerberos principals and credentials, but the tasks they spawn do not.
Delegation tokens are the solution for those tasks. The user talks to the resource manager
with their krb credentials, and obtains delegation tokens for yarn (same happens with the
NN and hdfs). These get stored inside the UserGroupInformation object for yourself and get
passed along through the RM, NM, app master and other containers, either over the protocol
or via filesystem-permission secured files on disk (the latter happening when the NM drops
privileges to run as "you" instead of "itself").

bq. you have to talk to the TServer which issued it

This would require us have clients hold onto N delegation tokens though. That'd make the client
implementation much more difficult than a singular delegation token that any node in the instance
can verify.

bq. If you use a single shared key, you really don't need leader election (because they all
have the secret and perform the same function)

You need the coordination to roll new secret keys. Using the same secret key for months (assuming
average uptime of a cluster) is just asking for attacks.

bq. I'm very curious precisely how you are generating these delegation tokens, though. I could
be on a completely separate page regarding that and your suggestion for leader elections.

Code will speak better than I can: https://github.com/joshelser/accumulo/tree/delegation-tokens/server/base/src/main/java/org/apache/accumulo/server/security/delegation.
I just finished this up, I think. Each Master and Tserver has a SecretManager implementation.
The Master (or more generally, whoever is creating the secret keys), also runs the KeyManager
which generates a new secret key every $timelength. That process also uses the KeyDistributor
to add secret keys to ZK (for all of the "followers"). The "followers" (tservers) use the
KeyWatcher to see changes made by the KeyDistributor and update their SecretManager.

In general, the SecretManager is a local cache off of ZooKeeper which can generate/verify
the passwords in delegation tokens. No mechanisms yet exist to ensure that all followers/tservers
have seen a new secret key. 



was (Author: elserj):
I haven't really read up about DIGEST-MD5. I'll have to look into that and see if there's
anything better we can use with SASL.

bq The individual MapReduce nodes do not have Kerberos principals at all? How do they authenticate
to the job controller?

Delegation tokens.

bq. you have to talk to the TServer which issued it

This would require us have clients hold onto N delegation tokens though. That'd make the client
implementation much more difficult than a singular delegation token that any node in the instance
can verify.

bq. If you use a single shared key, you really don't need leader election (because they all
have the secret and perform the same function)

You need the coordination to roll new secret keys. Using the same secret key for months (assuming
average uptime of a cluster) is just asking for attacks.

bq. I'm very curious precisely how you are generating these delegation tokens, though. I could
be on a completely separate page regarding that and your suggestion for leader elections.

Code will speak better than I can: https://github.com/joshelser/accumulo/tree/delegation-tokens/server/base/src/main/java/org/apache/accumulo/server/security/delegation.
I just finished this up, I think. Each Master and Tserver has a SecretManager implementation.
The Master (or more generally, whoever is creating the secret keys), also runs the KeyManager
which generates a new secret key every $timelength. That process also uses the KeyDistributor
to add secret keys to ZK (for all of the "followers"). The "followers" (tservers) use the
KeyWatcher to see changes made by the KeyDistributor and update their SecretManager.

In general, the SecretManager is a local cache off of ZooKeeper which can generate/verify
the passwords in delegation tokens. No mechanisms yet exist to ensure that all followers/tservers
have seen a new secret key. 


> Ensure MapReduce functionality with Kerberos enabled
> ----------------------------------------------------
>
>                 Key: ACCUMULO-3513
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Blocker
>             Fix For: 1.7.0
>
>         Attachments: ACCUMULO-3513-design.pdf
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop to help
get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to submit a job,
then the notion of delegation tokens is used by for further communication since the servers
do not have access to the client's sensitive information. A centralized service manages creation
of a delegation token which is a record which contains certain information (such as the submitting
user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to manage delegation
tokens to node managers to acquire and use to run jobs. Hadoop and HBase both contain code
which implements this general idea, but we will need to apply them Accumulo and verify that
it is M/R jobs still work on a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message