hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/HBaseTokenAuthentication" by GaryHelmling
Date Thu, 10 Mar 2011 00:59:44 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hbase/HBaseTokenAuthentication" page has been changed by GaryHelmling.
The comment on this change is: Initial draft.
http://wiki.apache.org/hadoop/Hbase/HBaseTokenAuthentication

--------------------------------------------------

New page:
= HBase Token Authentication =
While HBase security now supports Kerberos authentication for client RPC connections, this
is only part of the puzzle for integration with secure Hadoop.  Kerberos authentication is
only used for direct client access to HDFS.  The Hadoop MapReduce framework instead uses a
DIGEST-MD5 authentication scheme, where the client is granted a signed "delegation token"
and secret "token authenticator" (the SHA1 hash of the delegation token and a NN secret key)
when a MapReduce job is submitted.  The token and authenticator are serialized into a secure
location in HDFS, so that the spawned Child processes can de-serialize the credentials and
use them to re-authenticate to the NN as the submitting user.

Since Kerberos credentials are not used in the MapReduce task execution context, any client
attempts to authenticate to HBase will fail.  As a result, HBase connections will need to
support an alternate authentication scheme, similarly to the Hadoop MapReduce framework.

=== Goals ===
The main considerations for supporting map reduce authentication are:

 1. The implementation should avoid any changes to core Hadoop code.  Any changes in Hadoop
will require a great deal more review and discussion to potentially be accepted, and would
necessitate running a forked version of Hadoop for some time.
 1. Any changes should be transparent to existing map-reduce user code.  We shouldn't require
any new APIs to be used for authentication, for example.
 1. Changes to the job submission process, such as using a wrapper or utility to submit map-reduce
jobs, are preferable to any changes requiring code modifications

== HBase Authentication Tokens ==
While Hadoop user delegation tokens provide an existing means of Map``Reduce task authentication,
their reliance on an secret key stored in memory on the Name``Node makes them inaccessible
for authentication in HBase.  Fortunately, the Hadoop security implementation and Map``Reduce
job submission and execution code provides a generalized framework for token handling.  Building
on top of this, we can provide token based authentication from MR tasks to HBase without any
core Hadoop or Map``Reduce changes.

=== Proposal: Adding an HBase user token ===
 1. extend {{{org.apache.hadoop.security.token.TokenIdentifier}}} with our own token implementation
 1. implement {{{org.apache.hadoop.security.token.SecretManager}}}
 1. master will generate a secret key for signing and authenticating tokens
   a. will need to persist somewhere (zookeeper?) to allow for master restarts and failover
   a. will need to distribute generated secret key to RS
     i. could be on region checkin/heartbeats, though stack is removing those
     i. could be distributed through zookeeper as well
 1. add a helper like {{{TableMapReduceUtil.initJob()}}} to use when submitting a new job
   a. will obtain a new token from master
   a. add token to Credentials instance
   a. normal {{{JobClient}}} code will serialize Credentials for MR job
 1. when running MR job, Credentials will be deserialized from secure location
   a. HBaseClient will look in credentials for any relevant tokens

==== Limitations ====
 1. Doesn't appear we'll be able to use the existing delegation token renew mechanism (but
do we really need to do token renewal?)

=== Token ===
The HBase authentication token is modeled directly after the Hadoop user delegation token.
 We have dropped support for a designated renewer, however, as we will not be able to support
HBase token renewal without modification to core map reduce code.  The token will consist
of:
 * Token``ID:
   1. Owner ID -- Username that this token will authenticate as
   1. Issue date -- timestamp (in msec) when this token was generated
   1. Expire date -- timestamp (in msec) at which this token expires
   1. Sequence -- to ensure uniqueness
 * Token``Authenticator := HMAC_SHA1(master key, Token``ID)
 * Authentication Token := (Token``ID, Token``Authenticator)

==== Authentication ====
HBase token authentication builds on top of DIGEST-MD5 authentication support provided by
Hadoop RPC.  HBase token authentication follows the same process as Hadoop user delegation
token authentication by the !NameNode:
 1. Client sends Token``ID to server
 1. Server uses Token``ID and the in-memory master secret key to regenerate Token``Authenticator
 1. Server validates Token``ID, checks for expiration
 1. Server and client then use Token``Authenticator as the shared secret to negotiate DIGEST-MD5
authentication

==== Master Secret Key ====
Authentication relies on a secret key generated at runtime on the master and used to generate
Authentication Tokens for clients.  Tokens will be generated on the master for Kerberos authenticated
clients, but token based authentication will need to be allowed on all masters and region
servers in a cluster.  So the master will need a means to distribute the secret key to other
cluster nodes.

The master will also need to write the secret key to persistent storage in order for authentication
tokens to survive a cluster restart.

==== Implementation ====
 1. Extend {{{org.apache.hadoop.security.token.TokenIdentifier}}} with new HBase type
 1. Implement {{{org.apache.hadoop.security.token.TokenSelector}}} to pull out HBase type
tokens
 1. Extend {{{org.apache.hadoop.security.token.SecretManager}}} with implementation to generate
HBase tokens.  This will be used on HMaster to generate HBase tokens, and on HRegionServer
to validate tokens for authentication.

=== Map Reduce Flow ===
For all of this to work without changes to Hadoop and MapReduce code, we have two key requirements:
 1. We must be able to add our own tokens to the MR job Credentials instance at job submission
time (and the job must be able to serialize our token correctly with the rest of the job info)
 1. The Child task executing on each node must deserialize our token and add it to the {{{UserGroupInformation}}}
instance so it can later be picked up by the HBase client for authentication

==== Job Submission ====
 1. Add a new utility class {{{SecureMapReduceUtil}}} with a static helper method, something
like {{{void initAuthentication(Job job)}}}
   a. Call Master to obtain a new authentication token for the logged in user
     * Token will only be returned if user is authenticated via Kerberos, same as HDFS
   a. Add HBase token to job credentials -- {{{job.getCredentials().addToken(Text alias, Token)}}}
     * {{{FileSystem.getCanonicalServiceName()}}} is used as the alias for HDFS delegation
tokens, what should we use?
 1. {{{Job.submit()}}} is later called normally, which should serialize token with the rest
of the job credentials
   a. {{{JobTracker.submitJob()}}} receives the credentials via RPC and adds them to a {{{JobInProgress}}}
instance added to the job queue
   a. Scheduler will write out the tokens when the job is run.  {{{JobInProgress.initTasks()}}}
-> {{{generateAndStoreTokens()}}} -> {{{Credentials.writeTokenStorageFile()}}}
   a. The serialized tokens will be written to {{{<jobdir>/jobToken}}}

==== Job Execution on Task Nodes ====
 1. On task start, {{{Child.main()}}} will read in a copy of the tokens from the local filesystem,
local path passed as an env variable, read in using {{{TokenCache.loadTokens()}}}
 1. Each token is added to the child task {{{UserGroupInformation}}} instance used to run
the local task
 1. Any HBase connections opened by the task will inherit the same UGI
 1. A {{{TokenInfo}}} annotation on the {{{HRegionInterface}}} and {{{HMasterInterface}}}
protocol interfaces identifies the HBase {{{TokenSelector}}} implementation, which is then
used to extract the relevant authentication token from the UGI's credentials
 1. Using the HBase authentication token, the authentication process proceeds as above

Mime
View raw message