hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng, Kai" <kai.zh...@intel.com>
Subject RE: [DISCUSS] Security Efforts and Branching
Date Thu, 26 Sep 2013 07:28:37 GMT
Larry, and all

Apologize for not responding sooner. I read your proposals and think about how to collaborate
well and speed up things for all of us. From community discussions around the Hadoop Summit,
TokenAuth should be a pluggable full stack to accommodate different implementations. HADOOP-9392
reflects that thinking and came up with the breakdown attached in the JIRA. To simplify the
discussion I would try to illustrate it here in very high level as follows.

Simply we would have:
TokenAuth = TokenAuth framework + TokenAuth implementation (HAS) + TokenAuth integration

= TokenAuth framework =
It first defines TokenAuth as the desired pluggable framework that defines and provides required
APIs, protocols, flows, and facilities along with common implementations for related constructs,
entities and even services. The framework is a subject for continued discussion and defined
together as a common effort of the community. It's important that the framework be pluggable
in all the key places to allow certain solutions to employ their own product level implementations.
Based on this framework, we could build the HAS implementation. Initially, we have the following
items to think about to define relevant API and provide core facilities for the framework
and the list is to be complemented.
1. Common token definition;
2. TokenAuthn method for Hadoop RPC;
3. Authentication Service;
4. Identity Token Service;
5. Access Token Service;
6. Fine grained authorization;
7. Attribute Service;
8. Token authentication client;
9. Token cache;
10. Common configuration across TokenAuth;
11. Hadoop token command;
12. Key Provider;
13. Web SSO support;
14. REST SSO support;
15. Auditing support.

= TokenAuth implementation (HAS) =
This defines and implements Hadoop AuthN/AuthZ Server (HAS) based on TokenAuth framework.
HAS is a centralized server to address AAA (Authentication, Authorization, Auditing) concerns
for Hadoop across the ecosystem. The 'A' of HAS could stand for "Authentication", "Authorization",
or "Auditing", depending on which role(s) HAS is provisioned with. HAS is a complete and enterprise
ready security solution based on TokenAuth framework and utilizes the common facilities provided
by the framework. It customizes and provides all the necessary implementations of constructs,
entities, and services defined in the framework that's required by enterprise deployment.
Initially we have the following for the implementation:
1. Provide common and management facilities including configuration loading/syncing mechanism,
auditing and logging support, shared high availability approach, REST support and so on;
2. Implement Authentication Server role for HAS, implementing Authentication Service, and
Identity Token Service defined in the framework. The authentication engine can be configured
with a chain of authentication modules to support multi-factor authentication. Particularly,
it will support LDAP authentication;
3. Implement Authorization Server role for HAS, implementing Access Token Service;
4. Implement centralized administration for fine-grained authorization for Authorization Server
role. Optional in initial iteration;
5. Implement Attribute Service for HAS, to allow integration of third party attribute authorities.
Optional in initial iteration.
6. Provides authorization enforcement library for Hadoop services to enforce security policies
utilizing related services provided by the Authorization Server. Optional in initial iteration.

= TokenAuth integration =
This includes tasks that employ TokenAuth framework and relevant implementation(s) to enable
related supports for various Hadoop components across the ecosystem for typical enterprise
deployments. Currently we have the following in mind:
1. Enable Web SSO flow for web interfaces like HDFS and YARN;
2. Enable REST SSO flow for REST interface like Oozie;
3. Add Thrift and Hive JDBC support using TokenAuth. We consider this support because it is
an important interface for enterprise to interact with data;
4. Enable to access Zookeeper using TokenAuth since it's widely used as the coordinator across
the ecosystem.

I regard decouple of the pluggable framework from specific implementation as important since
we're addressing the similar requirements on the other hand we have different implementation
considerations in approaches like the ones represented by HADOOP-9392 and HADOOP-9533. For
example, to support pluggable authentication HADOOP-9392 prefers to JAAS based authentication
modules but HADOOP-9533 suggests using Apache Shiro. By this decouple we could best collaborate
and contribute, as far as I understood, you might agree with this approach as can be seen
in your recent email, "decouple the pluggable framework from any specific central server implementation".
If I understood you correctly, do you think for the initial iteration we have to have two
central servers like HAS server and HSSO server? If not, do you think it works for us to have
HAS as a community effort as the TokenAuth framework and we both contribute on the implementation?

To proceed, I would try to align between us, complementing your proposal and addressing your
concerns as follows.

= Iteration Endstate =
Besides what you mentioned from user view, how about adding this consideration:
Additionally, the initial iteration would also lay down the ground TokenAuth framework with
fine defined APIs, protocols, flows and core facilities for implementations. The framework
should avoid rework and big change for future implementations.

= Terminology and Naming =
It would be great if we can unify the related terminologies in this effort, at least in the
framework level. This could be probably achieved in the process of defining relevant APIs
for the TokenAuth framework.

= Project scope =
It's great we have the common list in scope for the first iteration as you mentioned as follows:
Usecases:
client types: REST, CLI, UI
authentication types: Simple, Kerberos, authentication/LDAP, federation/SAML

We might also consider OAuth 2.0 support. Anyway please note by defining this in-scope list
we know what's required as must-have in the iteration as enforcement of our consensus, however
it should not limit any relevant parties to contribute more meanwhile unless it does not be
appropriate at the time.

= Branch =
As you mentioned we may have different branches for different features considering merge.
 Another approach is just having one branch with relevant security features, the review and
merge work can still be JIRA based.

1. Based on your proposal, how about the following as the branch(es) scope:
1)  Pluggable Authentication and Token based SSO
2)  CryptoFS for volume level encryption (HCFS)
3) Pluggable UGI change
4) Key management system
5) Unified authorization

2. With the above scope in mind, a candidate branch name could be like 'security-branch' instead
of 'tokenauth-branch'. How about creating the branch now if we don't have other concerns?

3. Check-in philosophy. Agree with your proposal with slightly concerns:
In terms of check-in philosophy, we should take a review then check-in approach to the branch
with lazy consensus - wherein we do not need to explicitly +1 every check-in to the branch
but we will honor any -1's with discussion to resolve before checking in. This will provide
us each with the opportunity to track the work being done and ensure that we understand it
and find that it meets the intended goals.

We might need explicit +1 otherwise we would need define a time window pending to wait when
to check-in.
One issue we would like to clarify, does voting also include the security branch committers.

= JIRA =
We might not need additional umbrella JIRA for now since we already have HADOOP-9392 and HADOOP-9533.
By the way I would suggest we use existing feature JIRAs to discuss relevant and specific
issues on the going. Leveraging these JIRAs we might avoid too much details in the common-dev
thread and it's also easy to track relevant discussions.

I agree it's a good point to start with an inventory of the existing JIRAs. We can do that
if there're no other concerns. We would provide the full list of breakdown JIRAs and attach
it in HADOOP-9392 then for further collaboration.

Regards,
Kai

From: larry mccay [mailto:larry.mccay@gmail.com]
Sent: Wednesday, September 18, 2013 6:27 AM
To: Zheng, Kai; Chen, Haifeng; common-dev@hadoop.apache.org
Subject: Re: [DISCUSS] Security Efforts and Branching

All -

I apologize for not following up sooner. I have been heads down on some other matters that
required my attention.

It seems that it may be easier to move forward by gaining consensus a little bit at a time
rather than trying to hit the ground running where the other thread left off.

Would it be agreeable to everyone to start with an inventory of the existing Jiras that have
patches available or nearly available so that we can determine what concrete bits we have
to start with?

Once we get that done, we can try and frame a set of goals to to make up the initial iteration
and determine what from the inventory will be leverage in that iteration.

Does this sound reasonable to everyone?
Would anyone like to propose another starting point?

thanks,

--larry

On Wed, Sep 4, 2013 at 4:26 PM, larry mccay <larry.mccay@gmail.com<mailto:larry.mccay@gmail.com>>
wrote:
It doesn't look like the PDF made it all the way through to the archives and maybe even to
recipients - so the following is the text version of the iteration-1 draft:

Iteration 1: Pluggable User Authentication and Federation

Introduction
The intent of this effort is to bootstrap the development of pluggable token-based authentication
mechanisms to support certain goals of enterprise authentication integrations. By restricting
the scope of this effort, we hope to provide immediate benefit to the community while keeping
the initial contribution to a manageable size that can be easily reviewed, understood and
extended with further development through follow up JIRAs and related iterations.

Iteration Endstate
Once complete, this effort will have extended the authentication mechanisms - for all client
types - from the existing: Simple, Kerberos and Plain (for RPC) to include LDAP authentication
and SAML based federation. In addition, the ability to provide additional/custom authentication
mechanisms will be enabled for users to plug in their preferred mechanisms.

Project Scope
The scope of this effort is a subset of the features covered by the overviews of HADOOP-9392
and HADOOP-9533. This effort concentrates on enabling Hadoop to issue, accept/validate SSO
tokens of its own. The pluggable authentication mechanism within SASL/RPC layer and the authentication
filter pluggability for REST and UI components will be leveraged and extended to support the
results of this effort.

Out of Scope
In order to scope the initial deliverable as the minimally viable product, a handful of things
have been simplified or left out of scope for this effort. This is not meant to say that these
aspects are not useful or not needed but that they are not necessary for this iteration. We
do however need to ensure that we don't do anything to preclude adding them in future iterations.
1. Additional Attributes - the result of authentication will continue to use the existing
hadoop tokens and identity representations. Additional attributes used for finer grained authorization
decisions will be added through follow-up efforts.
2. Token revocation - the ability to revoke issued identity tokens will be added later
3. Multi-factor authentication - this will likely require additional attributes and is not
necessary for this iteration.
4. Authorization changes - we will require additional attributes for the fine-grained access
control plans. This is not needed for this iteration.
5. Domains - we assume a single flat domain for all users
6. Kinit alternative - we can leverage existing REST clients such as cURL to retrieve tokens
through authentication and federation for the time being
7. A specific authentication framework isn't really necessary within the REST endpoints for
this iteration. If one is available then we can use it otherwise we can leverage existing
things like Apache Shiro within a servlet filter.

In Scope
What is in scope for this effort is defined by the usecases described below. Components required
for supporting the usecases are summarized for each client type. Each component is a candidate
for a JIRA subtask - though multiple components are likely to be included in a JIRA to represent
a set of functionality rather than individual JIRAs per component.

Terminology and Naming
The terms and names of components within this document are merely descriptive of the functionality
that they represent. Any similarity or difference in names or terms from those that are found
in other documents are not intended to make any statement about those other documents or the
descriptions within. This document represents the pluggable authentication mechanisms and
server functionality required to replace Kerberos.

Ultimately, the naming of the implementation classes will be a product of the patches accepted
by the community.

Usecases:
client types: REST, CLI, UI
authentication types: Simple, Kerberos, authentication/LDAP, federation/SAML

Simple and Kerberos
Simple and Kerberos usecases continue to work as they do today. The addition of Authentication/LDAP
and Federation/SAML are added through the existing pluggability points either as they are
or with required extension. Either way, continued support for Simple and Kerberos must not
require changes to existing deployments in the field as a result of this effort.

REST
USECASE REST-1 Authentication/LDAP:
For REST clients, we will provide the ability to:
1. use cURL to Authenticate via LDAP through an IdP endpoint exposed by an AuthenticationServer
instance via REST calls to:
   a. authenticate - passing username/password returning a hadoop id_token
   b. get-access-token - from the TokenGrantingService by passing the hadoop id_token as an
Authorization: Bearer token along with the desired service name (master service name) returning
a hadoop access token
2. Successfully invoke a hadoop service REST API passing the hadoop access token through an
HTTP header as an Authorization Bearer token
   a. validation of the incoming token on the service endpoint is accomplished by an SSOAuthenticationHandler
3. Successfully block access to a REST resource when presenting a hadoop access token intended
for a different service
   a. validation of the incoming token on the service endpoint is accomplished by an SSOAuthenticationHandler

USECASE REST-2 Federation/SAML:
We will also provide federation capabilities for REST clients such that:
1. acquire SAML assertion token from a trusted IdP (shibboleth?) and persist in a permissions
protected file - ie. ~/.hadoop_tokens/.idp_token
2. use cURL to Federate a token from a trusted IdP through an SP endpoint exposed by an AuthenticationServer(FederationServer?)
instance via REST calls to:
   a. federate - passing a SAML assertion as an Authorization: Bearer token returning a hadoop
id_token
      - can copy and paste from commandline or use cat to include persisted token through
"--Header Authorization: Bearer 'cat ~/.hadoop_tokens/.id_token'"
   b. get-access-token - from the TokenGrantingService by passing the hadoop id_token as an
Authorization: Bearer token along with the desired service name (master service name) to the
TokenGrantingService returning a hadoop access token
3. Successfully invoke a hadoop service REST API passing the hadoop access token through an
HTTP header as an Authorization Bearer token
   a. validation of the incoming token on the service endpoint is accomplished by an SSOAuthenticationHandler
4. Successfully block access to a REST resource when presenting a hadoop access token intended
for a different service
   a. validation of the incoming token on the service endpoint is accomplished by an SSOAuthenticationHandler
REQUIRED COMPONENTS for REST USECASES:
COMP-1. REST client - cURL or similar
COMP-2. REST endpoint for BASIC authentication to LDAP - IdP endpoint example - returning
hadoop id_token
COMP-3. REST endpoint for federation with SAML Bearer token - shibboleth SP?|OpenSAML? - returning
hadoop id_token
COMP-4. REST TokenGrantingServer endpoint for acquiring hadoop access tokens from hadoop id_tokens
COMP-5. SSOAuthenticationHandler to validate incoming hadoop access tokens
COMP-6. some source of a SAML assertion - shibboleth IdP?
COMP-7. hadoop token and authority implementations
COMP-8. core services for crypto support for signing, verifying and PKI management

CLI
USECASE CLI-1 Authentication/LDAP:
For CLI/RPC clients, we will provide the ability to:
1. use cURL to Authenticate via LDAP through an IdP endpoint exposed by an AuthenticationServer
instance via REST calls to:
   a. authenticate - passing username/password returning a hadoop id_token
      - for RPC clients we need to persist the returned hadoop identity token in a file protected
by fs permissions so that it may be leveraged until expiry
      - directing the returned response to a file may suffice for now something like ">~/.hadoop_tokens/.id_token"
2. use hadoop CLI to invoke RPC API on a specific hadoop service
   a. RPC client negotiates a TokenAuth method through SASL layer, hadoop id_token is retrieved
from ~/.hadoop_tokens/.id_token is passed as Authorization: Bearer token to the get-access-token
REST endpoint exposed by TokenGrantingService returning a hadoop access token
   b. RPC server side validates the presented hadoop access token and continues to serve request
   c. Successfully invoke a hadoop service RPC API

USECASE CLI-2 Federation/SAML:
For CLI/RPC clients, we will provide the ability to:
1. acquire SAML assertion token from a trusted IdP (shibboleth?) and persist in a permissions
protected file - ie. ~/.hadoop_tokens/.idp_token
2. use cURL to Federate a token from a trusted IdP through an SP endpoint exposed by an AuthenticationServer(FederationServer?)
instance via REST calls to:
   a. federate - passing a SAML assertion as an Authorization: Bearer token returning a hadoop
id_token
      - can copy and paste from commandline or use cat to include previously persisted token
through "--Header Authorization: Bearer 'cat ~/.hadoop_tokens/.id_token'"
3. use hadoop CLI to invoke RPC API on a specific hadoop service
   a. RPC client negotiates a TokenAuth method through SASL layer, hadoop id_token is retrieved
from ~/.hadoop_tokens/.id_token is passed as Authorization: Bearer token to the get-access-token
REST endpoint exposed by TokenGrantingService returning a hadoop access token
   b. RPC server side validates the presented hadoop access token and continues to serve request
   c. Successfully invoke a hadoop service RPC API

REQUIRED COMPONENTS for CLI USECASES - (beyond those required for REST):
COMP-9. TokenAuth Method negotiation, etc
COMP-10. Client side implementation to leverage REST endpoint for acquiring hadoop access
tokens given a hadoop id_token
COMP-11. Server side implementation to validate incoming hadoop access tokens

UI
Various Hadoop services have their own web UI consoles for administration and end user interactions.
These consoles need to also benefit from the pluggability of authentication mechansims to
be on par with the access control of the cluster REST and RPC APIs.
Web consoles are protected with an WebSSOAuthenticationHandler which will be configured for
either authentication or federation.

USECASE UI-1 Authentication/LDAP:
For the authentication usecase:
1. User's browser requests access to a UI console page
2. WebSSOAuthenticationHandler intercepts the request and redirects the browser to an IdP
web endpoint exposed by the AuthenticationServer passing the requested url as the redirect_url
3. IdP web endpoint presents the user with a FORM over https
   a. user provides username/password and submits the FORM
4. AuthenticationServer authenticates the user with provided credentials against the configured
LDAP server and:
   a. leverages a servlet filter or other authentication mechanism for the endpoint and authenticates
the user with a simple LDAP bind with username and password
   b. acquires a hadoop id_token and uses it to acquire the required hadoop access token which
is added as a cookie
   c. redirects the browser to the original service UI resource via the provided redirect_url
5. WebSSOAuthenticationHandler for the original UI resource interrogates the incoming request
again for an authcookie that contains an access token upon finding one:
   a. validates the incoming token
   b. returns the AuthenticationToken as per AuthenticationHandler contract
   c. AuthenticationFilter adds the hadoop auth cookie with the expected token
   d. serves requested resource for valid tokens
   e. subsequent requests are handled by the AuthenticationFilter recognition of the hadoop
auth cookie

USECASE UI-2 Federation/SAML:
For the federation usecase:
1. User's browser requests access to a UI console page
2. WebSSOAuthenticationHandler intercepts the request and redirects the browser to an SP web
endpoint exposed by the AuthenticationServer passing the requested url as the redirect_url.
This endpoint:
   a. is dedicated to redirecting to the external IdP passing the required parameters which
may include a redirect_url back to itself as well as encoding the original redirect_url so
that it can determine it on the way back to the client
3. the IdP:
   a. challenges the user for credentials and authenticates the user
   b. creates appropriate token/cookie and redirects back to the AuthenticationServer endpoint
4. AuthenticationServer endpoint:
   a. extracts the expected token/cookie from the incoming request and validates it
   b. creates a hadoop id_token
   c. acquires a hadoop access token for the id_token
   d. creates appropriate cookie and redirects back to the original redirect_url - being the
requested resource
5. WebSSOAuthenticationHandler for the original UI resource interrogates the incoming request
again for an authcookie that contains an access token upon finding one:
   a. validates the incoming token
   b. returns the AuthenticationToken as per AuthenticationHandler contrac
   c. AuthenticationFilter adds the hadoop auth cookie with the expected token
   d. serves requested resource for valid tokens
   e. subsequent requests are handled by the AuthenticationFilter recognition of the hadoop
auth cookie
REQUIRED COMPONENTS for UI USECASES:
COMP-12. WebSSOAuthenticationHandler
COMP-13. IdP Web Endpoint within AuthenticationServer for FORM based login
COMP-14. SP Web Endpoint within AuthenticationServer for 3rd party token federation

On Wed, Sep 4, 2013 at 3:06 PM, larry mccay <lmccay@apache.org<mailto:lmccay@apache.org>>
wrote:
Hello Kai, Jerry and common-dev'ers -

I would like to try and get a game plan together for how we go about getting some of these
larger security changes into branches that are manageable, reviewable and ultimately mergeable
in a timely manner.

In order to even start this discussion, I think we need an inventory of the high level projects
that are underway in parallel. We can then identify those that are at the point where patches
can be used to seed a branch. This will give us some insight into how to break it into phases.

Off the top of my head, I can think of the following high level efforts:

1. Pluggable Authentication and Token based SSO
2. CryptoFS for volume level encryption
3. Hive Table/Column Level Encryption (admittedly this is Hive work but it will leverage common
work done in Hadoop)
4. Authorization

Now, #1 and #2 above have related Jiras and a number of patches available and are therefore
early contenders for branching.

#1 has a draft for an initial iteration that was discussed in another thread and I will attach
a pdf version of the iteration-1 proposal to this mail.

I propose that we converge on an initial plan based on further discussion of the attached
iteration and file a Jira to represent that iteration. We can then break down the larger patches
on existing Jiras to fit into the constrained scope of the agreed upon iteration and attach
them to subtasks of the iteration Jira.

We can then seed a Pluggable Authentication and Token based SSO branch with those related
patches from H-9392, H-9534, H-9781.

Now, whether we introduce a whole central sso service in that branch is up for discussion
but I personally think that it will violate the "keeping it small and manageable" goal. I
am wondering whether a branch for security services would do well to decouple the consumers
from a specific implementation that happens to be remote. Then within the Pluggable Authentication
branch - we can concentrate on the consumer level and local implementations.

I assume that the CryptoFS work is also intended to be done within the branches and we have
to therefore consider how to leverage common code for things like key access for encryption/decryption
and signing/verifying. This sort of thing is being introduced by H-9534 as part of the Pluggable
Authentication branch in support of JWT tokens. So, we will have to think through what branches
are required for Crypto in the near term.

Perhaps, we can concentrate on those portions of crypto that will be of immediate benefit
to iteration-1 and leave higher order CryptoFS stuff to another iteration? I don't think that
we want an explosion of branches at any given time. If we can limit it to specific areas,
close down on the iteration and get it merged before creating a new set of branches that would
be best. Again, ease of review, test and merge is important for us.

I am curious how development across related branches like these would work though. If the
service work needs to leverage work from the other how do we do that easily. Can we branch
a branch? Will that require both to be ready to merge at the same time?

Perhaps, low-level dependencies can be duplicated for some time and then consolidated later?

Anyway, specific questions:

Does the proposal to start with the attached iteration-1 draft to create an iteration Jira
make sense to everyone?

Does anyone have specific suggestions regarding the best way for managing branches that should
be decoupled but at the same time leverage common code?

Any other thoughts or insight?

thanks,

--larry





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message