accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-3513) Ensure MapReduce functionality with Kerberos enabled
Date Wed, 28 Jan 2015 02:52:36 GMT


Josh Elser commented on ACCUMULO-3513:

bq. At some point, I think that's the best we can get. We cannot get direct access to the
client's credentials, so we must trust another party (in this case, the MapReduce servers).

Right, I agree with you here, of course, but we still need some way to control when non-strongly-authenticated
users (w/o kerberos credentials) try to connect to Accumulo. That's the crux of what we need
to solve to make MapReduce actually work.

bq. We could require that the clients authenticate to Accumulo to generate a shared secret
(really, though, they just need to authenticate to the Authenticator implementation backing
Accumulo). This is analogous to the HDFS delegation token. The client can then give this shared
secret to the MapReduce layer to use when talking to Accumulo, to ensure that the client did
actually hand that secret to the MapReduce layer, requesting it to do work on its behalf

This is, like you say, ultimately what the delegation token boils down to and what I plan
to do. Yes, we need to trust the ResourceManager to disallow users who have no credentials,
but we still should have some shared secret support (a special token or data inside of a token)
to prevent the need for additional configuration to just run MapReduce with Hadoop security

bq. However, we still need to designate the MapReduce layer as trustable in some way... because
this layer could reuse one client's credentials to perform an unauthorized task and give the
results to a different user

Yes, we still need to trust that MapReduce is keeping the shared secret safe which we know
it does already. The ability to expire a shared secret gives us _some more_ confident that
the shared secret won't be reused by some unwanted party. The yarn tasks themselves are run
as the submitting user, so all we are relying on YARN to do is to set up a proper environment
running as the client (to be clear, the actual unix user).

bq. The whitelist mechanism gives us some assurance that we've vetted that layer to not do
those sorts of things.

We don't need a whitelist mechanism unless you're not trusting YARN itself which doesn't make
any sense to me (which I think you already agree on)

> Ensure MapReduce functionality with Kerberos enabled
> ----------------------------------------------------
>                 Key: ACCUMULO-3513
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Blocker
>             Fix For: 1.7.0
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop to help
get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to submit a job,
then the notion of delegation tokens is used by for further communication since the servers
do not have access to the client's sensitive information. A centralized service manages creation
of a delegation token which is a record which contains certain information (such as the submitting
user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to manage delegation
tokens to node managers to acquire and use to run jobs. Hadoop and HBase both contain code
which implements this general idea, but we will need to apply them Accumulo and verify that
it is M/R jobs still work on a kerberized environment.

This message was sent by Atlassian JIRA

View raw message