accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
Date Wed, 28 Jan 2015 05:26:34 GMT


Christopher Tubbs commented on ACCUMULO-3513:

bq. I'm not sure how we can make any reliable security model if we operate under the assumption
that YARN is insecure. We have to trust that the YARN task was correctly authenticated.

Right, we have to authenticate both YARN *and* the end user. Even if YARN doesn't work this
way, and it uses some delegation token instead of any identifying information about itself,
Accumulo's implementation requires a Kerberos token at the transport layer. You can't just
omit a Kerberos token and replace it with a delegation token in Accumulo's implementation
(nor do I think it'd be a good idea to try, because I do think we need to authenticate the
middle-man, in this case YARN).

bq. Again. We have to assume YARN is doing the right thing.

No, we absolutely do not have to make any such assumption. We can validate that by only whitelisting
approved, trusted intermediaries. This is no different than X.509 extensions that designate
permitted uses on certificates. The fact that a certificate was signed by the same CA, does
not automatically make it appropriate to use to sign executable code, or to encrypt email.
The only thing is, Kerberos does not have any such mechanism built-in, like X.509 certificate
extensions, so whitelist is the only option.

bq. The code running inside a YARN task is untrusted (unless you restrict job submission and
vet the users externally – hit the users with a stick and tell them to behave). We should
not be trusting this code to act as the user that it should.

That's just my point... you don't know what is going on inside the YARN system. For all you
know, there is a job accessing the local disk or system memory, searching for other client's
credentials, and using them to connect to Accumulo. Just because YARN tries to connect using
some client's credentials, it doesn't mean it's a valid use (granted, that takes effort).
You've got to actually lock down your YARN instance vet the infrastructure and the code it
runs before you can be sure that the credentials a job in YARN uses to try to connect to Accumulo
with are for a legitimate purpose. But, once this is done, the precise degree to which the
additional security offered by the delegation token (due to expirable attributes, for instance)
is debatable... but I concede that it is at least marginally better than without, so we can
move past that point if you like. If it has the ability to expire, I'm in favor.

bq. The shared secret is acting in place of the kerberos credentials because there is no credentials
available for use. ...

I'm not so sure that's true. There's no credentials that represent the end user, which are
available to use, but the YARN process itself should have some Kerberos identity, shouldn't
it? I've read that paper, but and the quoted portion, but I had assumed (perhaps incorrectly)
that the YARN process would use its own Kerberos credentials to set up the transport layer,
over which it sends the delegation token for additional validation and authorization. I assumed
the wording about it using a delegation token in place of a Kerberos token was just shorthand
for something a bit more complicated. Otherwise, what network protocol is it using that supports
both Kerberos and a delegation token? Even if HDFS/YARN is using some custom protocol which
supports both (or two RPC endpoints), Accumulo's SASL implementation certainly is not... it
needs *some* Kerberos credentials to set up the transport layer, before we can send any delegation
token or whatever across.

> Ensure MapReduce functionality with Kerberos enabled
> ----------------------------------------------------
>                 Key: ACCUMULO-3513
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Blocker
>             Fix For: 1.7.0
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop to help
get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to submit a job,
then the notion of delegation tokens is used by for further communication since the servers
do not have access to the client's sensitive information. A centralized service manages creation
of a delegation token which is a record which contains certain information (such as the submitting
user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to manage delegation
tokens to node managers to acquire and use to run jobs. Hadoop and HBase both contain code
which implements this general idea, but we will need to apply them Accumulo and verify that
it is M/R jobs still work on a kerberized environment.

This message was sent by Atlassian JIRA

View raw message