hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Larry McCay <lmc...@hortonworks.com>
Subject Re: [DISCUSS] Hadoop SSO/Token Server Components
Date Wed, 03 Jul 2013 20:13:40 GMT
Thanks, Brian!
Look at that - the power of collaboration - the numbering is correct already! ;-)

I am inclined to agree that we should start with the Hadoop SSO Tokens and am leaning toward
a new jira that leaves behind the cruft but I don't feel very strongly about it being new.
I do feel like, especially given Kai's new document, that we have only one.

On Jul 3, 2013, at 2:32 PM, Brian Swan <Brian.Swan@microsoft.com> wrote:

> Thanks, Larry, for starting this conversation (and thanks for the great Summit meeting
summary you sent out a couple of days ago). To weigh in on your specific discussion points
(and renumber them :-))...
> 
> 1. Are there additional components that would be required for a Hadoop SSO service?
> Not that I can see.
> 
> 2. Should any of the above described components be considered not actually necessary
or poorly described?
> I think this will be determined as we get into the details of each component. What you've
described here is certainly an excellent starting point.
> 
> 3. Should we create a new umbrella Jira to identify each of these as a subtask?
> 4. Should we just continue to use 9533 for the SSO server and add additional subtasks?
> What is described here seem to fit with 9533, though 9533 may contain some details that
need further discussion. IMHO, it may be better to file a new umbrella Jira, though I'm not
100% convinced of that. Would be very interested on input from others.
> 
> 5. What are the natural seams of separation between these components and any dependencies
between one and another that affect priority?
> Is 4 the right place to start? (4. Hadoop SSO Tokens: the exact shape and form of the
sso tokens...) It seemed that in some 1:1 conversations after the Summit meeting that others
may agree with this. Would like to hear if that is the case more broadly.
> 
> -Brian
> 
> -----Original Message-----
> From: Larry McCay [mailto:lmccay@hortonworks.com] 
> Sent: Tuesday, July 2, 2013 1:04 PM
> To: common-dev@hadoop.apache.org
> Subject: [DISCUSS] Hadoop SSO/Token Server Components
> 
> All -
> 
> As a follow up to the discussions that were had during Hadoop Summit, I would like to
introduce the discussion topic around the moving parts of a Hadoop SSO/Token Service.
> There are a couple of related Jira's that can be referenced and may or may not be updated
as a result of this discuss thread.
> 
> https://issues.apache.org/jira/browse/HADOOP-9533
> https://issues.apache.org/jira/browse/HADOOP-9392
> 
> As the first aspect of the discussion, we should probably state the overall goals and
scoping for this effort:
> * An alternative authentication mechanism to Kerberos for user authentication
> * A broader capability for integration into enterprise identity and SSO solutions
> * Possibly the advertisement/negotiation of available authentication mechanisms
> * Backward compatibility for the existing use of Kerberos
> * No (or minimal) changes to existing Hadoop tokens (delegation, job, block access, etc)
> * Pluggable authentication mechanisms across: RPC, REST and webui enforcement points
> * Continued support for existing authorization policy/ACLs, etc
> * Keeping more fine grained authorization policies in mind - like attribute based access
control
> 	- fine grained access control is a separate but related effort that we must not preclude
with this effort
> * Cross cluster SSO
> 
> In order to tease out the moving parts here are a couple high level and simplified descriptions
of SSO interaction flow:
>                               +------+
> 	+------+ credentials 1 | SSO  |
> 	|CLIENT|-------------->|SERVER|
> 	+------+  :tokens      +------+
> 	  2 |                    
> 	    | access token
> 	    V :requested resource
> 	+-------+
> 	|HADOOP |
> 	|SERVICE|
> 	+-------+
> 	
> The above diagram represents the simplest interaction model for an SSO service in Hadoop.
> 1. client authenticates to SSO service and acquires an access token
>  a. client presents credentials to an authentication service endpoint exposed by the
SSO server (AS) and receives a token representing the authentication event and verified identity
>  b. client then presents the identity token from 1.a. to the token endpoint exposed by
the SSO server (TGS) to request an access token to a particular Hadoop service and receives
an access token 2. client presents the Hadoop access token to the Hadoop service for which
the access token has been granted and requests the desired resource or services
>  a. access token is presented as appropriate for the service endpoint protocol being
used
>  b. Hadoop service token validation handler validates the token and verifies its integrity
and the identity of the issuer
> 
>    +------+
>    |  IdP |
>    +------+
>    1   ^ credentials
>        | :idp_token
>        |                      +------+
> 	+------+  idp_token  2 | SSO  |
> 	|CLIENT|-------------->|SERVER|
> 	+------+  :tokens      +------+
> 	  3 |                    
> 	    | access token
> 	    V :requested resource
> 	+-------+
> 	|HADOOP |
> 	|SERVICE|
> 	+-------+
> 	
> 
> The above diagram represents a slightly more complicated interaction model for an SSO
service in Hadoop that removes Hadoop from the credential collection business.
> 1. client authenticates to a trusted identity provider within the enterprise and acquires
an IdP specific token
>  a. client presents credentials to an enterprise IdP and receives a token representing
the authentication identity 2. client authenticates to SSO service and acquires an access
token
>  a. client presents idp_token to an authentication service endpoint exposed by the SSO
server (AS) and receives a token representing the authentication event and verified identity
>  b. client then presents the identity token from 2.a. to the token endpoint exposed by
the SSO server (TGS) to request an access token to a particular Hadoop service and receives
an access token 3. client presents the Hadoop access token to the Hadoop service for which
the access token has been granted and requests the desired resource or services
>  a. access token is presented as appropriate for the service endpoint protocol being
used
>  b. Hadoop service token validation handler validates the token and verifies its integrity
and the identity of the issuer
> 	
> Considering the above set of goals and high level interaction flow description, we can
start to discuss the component inventory required to accomplish this vision:
> 
> 1. SSO Server Instance: this component must be able to expose endpoints for both authentication
of users by collecting and validating credentials and federation of identities represented
by tokens from trusted IdPs within the enterprise. The endpoints should be composable so as
to allow for multifactor authentication mechanisms. They will also need to return tokens that
represent the authentication event and verified identity as well as access tokens for specific
Hadoop services.
> 
> 2. Authentication Providers: pluggable authentication mechanisms must be easily created
and configured for use within the SSO server instance. They will ideally allow the enterprise
to plugin their preferred components from off the shelf as well as provide custom providers.
Supporting existing standards for such authentication providers should be a top priority concern.
There are a number of standard approaches in use in the Java world: JAAS loginmodules, servlet
filters, JASPIC authmodules, etc. A pluggable provider architecture that allows the enterprise
to leverage existing investments in these technologies and existing skill sets would be ideal.
> 
> 3. Token Authority: a token authority component would need to have the ability to issue,
verify and revoke tokens. This authority will need to be trusted by all enforcement points
that need to verify incoming tokens. Using something like PKI for establishing trust will
be required.
> 
> 4. Hadoop SSO Tokens: the exact shape and form of the sso tokens will need to be considered
in order to determine the means by which trust and integrity are ensured while using them.
There may be some abstraction of the underlying format provided through interface based design
but all token implementations will need to have the same attributes and capabilities in terms
of validation and cryptographic verification.
> 
> 5. SSO Protocol: the lowest common denominator protocol for SSO server interactions across
client types would likely be REST. Depending on the REST client in use it may require explicitly
coding to the token flow described in the earlier interaction descriptions or a plugin may
be provided for things like HTTPClient, curl, etc. RPC clients will have this taken care for
them within the SASL layer and will leverage the REST endpoints as well. This likely implies
trust requirements for the RPC client to be able to trust the SSO server's identity cert that
is presented over SSL. 
> 
> 6. REST Client Agent Plugins: required for encapsulating the interaction with the SSO
server for the client programming models. We may need these for many client types: e.g. Java,
JavaScript, .Net, Python, cURL etc.
> 
> 7. Server Side Authentication Handlers: the server side of the REST, RPC or webui connection
will need to be able to validate and verify the incoming Hadoop tokens in order to grant or
deny access to requested resources.
> 
> 8. Credential/Trust Management: throughout the system - on client and server sides -
we will need to manage and provide access to PKI and potentially shared secret artifacts in
order to establish the required trust relationships to replace the mutual authentication that
would be otherwise provided by using kerberos everywhere.
> 
> So, discussion points:
> 
> 1. Are there additional components that would be required for a Hadoop SSO service?
> 2. Should any of the above described components be considered not actually necessary
or poorly described?
> 2. Should we create a new umbrella Jira to identify each of these as a subtask?
> 3. Should we just continue to use 9533 for the SSO server and add additional subtasks?
> 4. What are the natural seams of separation between these components and any dependencies
between one and another that affect priority?
> 
> Obviously, each component that we identify will have a jira of its own - more than likely
- so we are only trying to identify the high level descriptions for now.
> 
> Can we try and drive this discussion to a close by the end of the week? This will allow
us to start breaking out into component implementation plans.
> 
> thanks,
> 
> --larry
> 


Mime
View raw message