Return-Path: X-Original-To: apmail-hadoop-common-dev-archive@www.apache.org Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 96CE610827 for ; Thu, 4 Jul 2013 16:19:01 +0000 (UTC) Received: (qmail 14086 invoked by uid 500); 4 Jul 2013 16:18:59 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 14001 invoked by uid 500); 4 Jul 2013 16:18:59 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 13993 invoked by uid 99); 4 Jul 2013 16:18:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Jul 2013 16:18:58 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lmccay@hortonworks.com designates 209.85.161.171 as permitted sender) Received: from [209.85.161.171] (HELO mail-gg0-f171.google.com) (209.85.161.171) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Jul 2013 16:18:53 +0000 Received: by mail-gg0-f171.google.com with SMTP id q3so452541gge.30 for ; Thu, 04 Jul 2013 09:18:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=WfMKKhyNX9V8VyhB/YEbY5q62OvxIu/Nz8/xVDHEw+E=; b=R4LtQ8/OVzmkAOmyheKt/Sz7oWggjKRLdelo6Q1XLxZP4u1KxSegtwE58q5aLj2Typ BM3qnlUFM7C1Sc4f6BHz5EAXcrYISKXesJk6PTHM2sk+hAyGvRd+w14pXS2ARPvuhPr5 sSYcb9GlO4IqLER7W6pTbOGeF4UAY4JkBDAddazVatVs0p9e/rhDI34FQluyFPZr0A/U fJT8rNjgTmmHmwk7wuEiYy7DF4MtxyQX0TbMitVzLP48xCcOCTnVEqtCwfkso6pxdaMj SoxcxqHij83kYhCLXDGEbQ1WWYtwanrbtooxfJZNQHorIZrinVsrKlbERRkIeKWv9WmP iFHg== X-Received: by 10.236.134.164 with SMTP id s24mr3543300yhi.44.1372954712074; Thu, 04 Jul 2013 09:18:32 -0700 (PDT) Received: from new-host-3.home (pool-173-72-7-233.cmdnnj.fios.verizon.net. [173.72.7.233]) by mx.google.com with ESMTPSA id f67sm5742585yhh.9.2013.07.04.09.18.30 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 04 Jul 2013 09:18:31 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1485\)) Subject: Re: [DISCUSS] Hadoop SSO/Token Server Components From: Larry McCay In-Reply-To: <8D5F7E3237B3ED47B84CF187BB17B6661166B0DE@SHSMSX103.ccr.corp.intel.com> Date: Thu, 4 Jul 2013 12:18:29 -0400 Cc: "common-dev@hadoop.apache.org" , "Li, Tianyou" Content-Transfer-Encoding: quoted-printable Message-Id: <68751F0D-2936-47E4-BD59-02080829AF68@hortonworks.com> References: <8D5F7E3237B3ED47B84CF187BB17B6661166A47C@SHSMSX103.ccr.corp.intel.com> <3E748277-8A9F-43C7-BDDA-E6EB4AEF0511@hortonworks.com> <8D5F7E3237B3ED47B84CF187BB17B6661166B0DE@SHSMSX103.ccr.corp.intel.com> To: "Zheng, Kai" X-Mailer: Apple Mail (2.1485) X-Gm-Message-State: ALoCoQn+qUsZWa2h4982sBYHb3539LtHWEWVW1WZ3r4jEA7fKuWZO95FrFihtE0yQ+QRp3mpQDZn X-Virus-Checked: Checked by ClamAV on apache.org *sigh* I'm not sure how I am failing to communicate this but will try to = briefly do it again=85 I never asked for differences between the two silo'd jiras and am = attempting to not speak to them within this thread as that is causing = thrashing that we can't really afford. There have been a number of folks working on security features within = the community across projects. Many of these things have been rather = isolated things that needed to be done and not much community = involvement was needed. As we look into these larger endeavors working = in silos without a cohesive community is a problem. We are trying to = introduce a community for security as a cross cutting concern throughout = the Hadoop ecosystem.=20 In order to do this, we need to step back and approach the whole effort = as a community. We identified a couple ways to start this: 1. using common-dev as the security community email list - at least for = the time being 2. finding a wiki space to articulate a holistic view of the security = model and drive changes from that common understanding 3. begin the community work by focusing on this authentication = alternative to kerberos Here is what was agreed upon to be discussed by the community for #3 = above: 1. restart with a clean slate - define and meet the goals of the = community with a single design/vision 2. scope the effort to authentication while keeping in mind not to = preclude other related aspects of the Hadoop security roadmap - = authorization, auditing, etc 3. we are looking for an alternative to kerberos authentication for = users - not for services - for at least for the first phase services = would continue to authenticate using kerberos - though it needs to be = made easier 4. we would enumerate the high level components needed for this kerberos = alternative 5. we would then drill down into the details of the components 5. finally identify the seams of separation that allow for parallel work = and get the vision delivered This email was intended to facilitate the discussion of those things. To compare and contrast the two silo'd jiras sets this community work = back instead of moving it forward. We have a need with a very manageable scope and could use your help in = defining from the context of your current work. As Aaron stated, the community discussions around this topic have been = encouraging and I also hope that they and the security community = continue and grow. Regarding the discussion points that still have not been addressed, I = can see one possible additional component - though perhaps it is an = aspect of the authentication providers - that you list below as a one of = the "differences". That would be your thinking around the use of domains = for multi-tenancy. I have trouble separating user domains from the IdPs = deployed in the enterprise or cloud environment. Can you elaborate on = how these domains relate to those that may be found within a particular = IdP offering and how they work together or complement each other? We = should be able to determine whether it is an aspect of the pluggable = authentication providers or something that should be considered a = separate component from that description. I will be less available for the rest of the day - 4th of July stuff. On Jul 4, 2013, at 7:21 AM, "Zheng, Kai" wrote: > Hi Larry, >=20 > Our design from its first revision focuses on and provides = comprehensive support to allow pluggable authentication mechanisms based = on a common token, trying to address single sign on issues across the = ecosystem to support access to Hadoop services via RPC, REST, and web = browser SSO flow. The updated design doc adds even more texts and flows = to explain or illustrate these existing items in details as requested by = some on the JIRA. >=20 > Additional to the identity token we had proposed, we adopted access = token and adapted the approach not only for sake of making TokenAuth = compatible with HSSO, but also for better support of fine grained access = control, and seamless integration with our authorization framework and = even 3rd party authorization service like OAuth Authorization Server. We = regard these as important because Hadoop is evolving into an enterprise = and cloud platform that needs a complete authN and authZ solution and = without this support we would need future rework to complete the = solution. >=20 > Since you asked about the differences between TokenAuth and HSSO, here = are some key ones: >=20 > TokenAuth supports TAS federation to allow clients to access multiple = clusters without a centralized SSO server while HSSO provides a = centralized SSO server for multiple clusters. >=20 > TokenAuth integrates authorization framework with auditing support in = order to provide a complete solution for enterprise data access = security. This allows administrators to administrate security polices = centrally and have the polices be enforced consistently across = components in the ecosystem in a pluggable way that supports different = authorization models like RBAC, ABAC and even XACML standards. >=20 > TokenAuth targets support for domain based authN & authZ to allow = multi-tenant deployments. Authentication and authorization rules can be = configured and enforced per domain, which allows organizations to manage = their individual policies separately while sharing a common large pool = of resources. >=20 > TokenAuth addresses proxy/impersonation case with flow as Tianyou = mentioned, where a service can proxy client to access another service in = a secured and constrained way. >=20 > Regarding token based authentication plus SSO and unified = authorization framework, HADOOP-9392 and HADOOP-9466 let's continue to = use these as umbrella JIRAs for these efforts. HSSO targets support for = centralized SSO server for multiple clusters and as we have pointed out = before is a nice subset of the work proposed on HADOOP-9392. Let's align = these two JIRAs and address the question Kevin raised multiple times in = 9392/9533 JIRAs, "How can HSSO and TAS work together? What is the = relationship?". The design update I provided was meant to provide the = necessary details so we can nail down that relationship and collaborate = on the implementation of these JIRAs. >=20 > As you have also confirmed, this design aligns with related community = discussions, so let's continue our collaborative effort to contribute = code to these JIRAs. >=20 > Regards, > Kai >=20 > -----Original Message----- > From: Larry McCay [mailto:lmccay@hortonworks.com]=20 > Sent: Thursday, July 04, 2013 4:10 AM > To: Zheng, Kai > Cc: common-dev@hadoop.apache.org > Subject: Re: [DISCUSS] Hadoop SSO/Token Server Components >=20 > Hi Kai - >=20 > I think that I need to clarify something... >=20 > This is not an update for 9533 but a continuation of the discussions = that are focused on a fresh look at a SSO for Hadoop. > We've agreed to leave our previous designs behind and therefore we = aren't really seeing it as an HSSO layered on top of TAS approach or an = HSSO vs TAS discussion. >=20 > Your latest design revision actually makes it clear that you are now = targeting exactly what was described as HSSO - so comparing and = contrasting is not going to add any value. >=20 > What we need you to do at this point, is to look at those high-level = components described on this thread and comment on whether we need = additional components or any that are listed that don't seem necessary = to you and why. > In other words, we need to define and agree on the work that has to be = done. >=20 > We also need to determine those components that need to be done before = anything else can be started. > I happen to agree with Brian that #4 Hadoop SSO Tokens are central to = all the other components and should probably be defined and POC'd in = short order. >=20 > Personally, I think that continuing the separation of 9533 and 9392 = will do this effort a disservice. There doesn't seem to be enough = differences between the two to justify separate jiras anymore. It may be = best to file a new one that reflects a single vision without the extra = cruft that has built up in either of the existing ones. We would = certainly reference the existing ones within the new one. This approach = would align with the spirit of the discussions up to this point. >=20 > I am prepared to start a discussion around the shape of the two Hadoop = SSO tokens: identity and access. If this is what others feel the next = topic should be. > If we can identify a jira home for it, we can do it there - otherwise = we can create another DISCUSS thread for it. >=20 > thanks, >=20 > --larry >=20 >=20 > On Jul 3, 2013, at 2:39 PM, "Zheng, Kai" wrote: >=20 >> Hi Larry, >>=20 >> Thanks for the update. Good to see that with this update we are now = aligned on most points. >>=20 >> I have also updated our TokenAuth design in HADOOP-9392. The new = revision incorporates feedback and suggestions in related discussion = with the community, particularly from Microsoft and others attending the = Security design lounge session at the Hadoop summit. Summary of the = changes: >> 1. Revised the approach to now use two tokens, Identity Token plus = Access Token, particularly considering our authorization framework and = compatibility with HSSO; >> 2. Introduced Authorization Server (AS) from our authorization = framework into the flow that issues access tokens for clients with = identity tokens to access services; >> 3. Refined proxy access token and the proxy/impersonation flow; >> 4. Refined the browser web SSO flow regarding access to Hadoop web = services; >> 5. Added Hadoop RPC access flow regarding CLI clients accessing = Hadoop services via RPC/SASL; >> 6. Added client authentication integration flow to illustrate how = desktop logins can be integrated into the authentication process to TAS = to exchange identity token; >> 7. Introduced fine grained access control flow from authorization = framework, I have put it in appendices section for the reference; >> 8. Added a detailed flow to illustrate Hadoop Simple = authentication over TokenAuth, in the appendices section; >> 9. Added secured task launcher in appendices as possible solutions = for Windows platform; >> 10. Removed low level contents, and not so relevant parts into = appendices section from the main body. >>=20 >> As we all think about how to layer HSSO on TAS in TokenAuth = framework, please take some time to look at the doc and then let's = discuss the gaps we might have. I would like to discuss these gaps with = focus on the implementations details so we are all moving towards = getting code done. Let's continue this part of the discussion in = HADOOP-9392 to allow for better tracking on the JIRA itself. For = discussions related to Centralized SSO server, suggest we continue to = use HADOOP-9533 to consolidate all discussion related to that JIRA. That = way we don't need extra umbrella JIRAs. >>=20 >> I agree we should speed up these discussions, agree on some of the = implementation specifics so both us can get moving on the code while not = stepping on each other in our work. >>=20 >> Look forward to your comments and comments from others in the = community. Thanks. >>=20 >> Regards, >> Kai >>=20 >> -----Original Message----- >> From: Larry McCay [mailto:lmccay@hortonworks.com] >> Sent: Wednesday, July 03, 2013 4:04 AM >> To: common-dev@hadoop.apache.org >> Subject: [DISCUSS] Hadoop SSO/Token Server Components >>=20 >> All - >>=20 >> As a follow up to the discussions that were had during Hadoop Summit, = I would like to introduce the discussion topic around the moving parts = of a Hadoop SSO/Token Service. >> There are a couple of related Jira's that can be referenced and may = or may not be updated as a result of this discuss thread. >>=20 >> https://issues.apache.org/jira/browse/HADOOP-9533 >> https://issues.apache.org/jira/browse/HADOOP-9392 >>=20 >> As the first aspect of the discussion, we should probably state the = overall goals and scoping for this effort: >> * An alternative authentication mechanism to Kerberos for user=20 >> authentication >> * A broader capability for integration into enterprise identity and=20= >> SSO solutions >> * Possibly the advertisement/negotiation of available authentication=20= >> mechanisms >> * Backward compatibility for the existing use of Kerberos >> * No (or minimal) changes to existing Hadoop tokens (delegation, job,=20= >> block access, etc) >> * Pluggable authentication mechanisms across: RPC, REST and webui=20 >> enforcement points >> * Continued support for existing authorization policy/ACLs, etc >> * Keeping more fine grained authorization policies in mind - like = attribute based access control >> - fine grained access control is a separate but related effort = that=20 >> we must not preclude with this effort >> * Cross cluster SSO >>=20 >> In order to tease out the moving parts here are a couple high level = and simplified descriptions of SSO interaction flow: >> +------+ >> +------+ credentials 1 | SSO | >> |CLIENT|-------------->|SERVER| >> +------+ :tokens +------+ >> 2 | =20 >> | access token >> V :requested resource >> +-------+ >> |HADOOP | >> |SERVICE| >> +-------+ >> =09 >> The above diagram represents the simplest interaction model for an = SSO service in Hadoop. >> 1. client authenticates to SSO service and acquires an access token =20= >> a. client presents credentials to an authentication service endpoint=20= >> exposed by the SSO server (AS) and receives a token representing the=20= >> authentication event and verified identity b. client then presents=20= >> the identity token from 1.a. to the token endpoint exposed by the SSO=20= >> server (TGS) to request an access token to a particular Hadoop = service=20 >> and receives an access token 2. client presents the Hadoop access=20 >> token to the Hadoop service for which the access token has been=20 >> granted and requests the desired resource or services a. access = token=20 >> is presented as appropriate for the service endpoint protocol being=20= >> used b. Hadoop service token validation handler validates the token=20= >> and verifies its integrity and the identity of the issuer >>=20 >> +------+ >> | IdP | >> +------+ >> 1 ^ credentials >> | :idp_token >> | +------+ >> +------+ idp_token 2 | SSO | >> |CLIENT|-------------->|SERVER| >> +------+ :tokens +------+ >> 3 | =20 >> | access token >> V :requested resource >> +-------+ >> |HADOOP | >> |SERVICE| >> +-------+ >> =09 >>=20 >> The above diagram represents a slightly more complicated interaction = model for an SSO service in Hadoop that removes Hadoop from the = credential collection business. >> 1. client authenticates to a trusted identity provider within the=20 >> enterprise and acquires an IdP specific token a. client presents=20 >> credentials to an enterprise IdP and receives a token representing = the=20 >> authentication identity 2. client authenticates to SSO service and=20 >> acquires an access token a. client presents idp_token to an=20 >> authentication service endpoint exposed by the SSO server (AS) and=20 >> receives a token representing the authentication event and verified=20= >> identity b. client then presents the identity token from 2.a. to the=20= >> token endpoint exposed by the SSO server (TGS) to request an access=20= >> token to a particular Hadoop service and receives an access token 3.=20= >> client presents the Hadoop access token to the Hadoop service for=20 >> which the access token has been granted and requests the desired=20 >> resource or services a. access token is presented as appropriate for=20= >> the service endpoint protocol being used b. Hadoop service token=20 >> validation handler validates the token and verifies its integrity and=20= >> the identity of the issuer >> =09 >> Considering the above set of goals and high level interaction flow = description, we can start to discuss the component inventory required to = accomplish this vision: >>=20 >> 1. SSO Server Instance: this component must be able to expose = endpoints for both authentication of users by collecting and validating = credentials and federation of identities represented by tokens from = trusted IdPs within the enterprise. The endpoints should be composable = so as to allow for multifactor authentication mechanisms. They will also = need to return tokens that represent the authentication event and = verified identity as well as access tokens for specific Hadoop services. >>=20 >> 2. Authentication Providers: pluggable authentication mechanisms must = be easily created and configured for use within the SSO server instance. = They will ideally allow the enterprise to plugin their preferred = components from off the shelf as well as provide custom providers. = Supporting existing standards for such authentication providers should = be a top priority concern. There are a number of standard approaches in = use in the Java world: JAAS loginmodules, servlet filters, JASPIC = authmodules, etc. A pluggable provider architecture that allows the = enterprise to leverage existing investments in these technologies and = existing skill sets would be ideal. >>=20 >> 3. Token Authority: a token authority component would need to have = the ability to issue, verify and revoke tokens. This authority will need = to be trusted by all enforcement points that need to verify incoming = tokens. Using something like PKI for establishing trust will be = required. >>=20 >> 4. Hadoop SSO Tokens: the exact shape and form of the sso tokens will = need to be considered in order to determine the means by which trust and = integrity are ensured while using them. There may be some abstraction of = the underlying format provided through interface based design but all = token implementations will need to have the same attributes and = capabilities in terms of validation and cryptographic verification. >>=20 >> 5. SSO Protocol: the lowest common denominator protocol for SSO = server interactions across client types would likely be REST. Depending = on the REST client in use it may require explicitly coding to the token = flow described in the earlier interaction descriptions or a plugin may = be provided for things like HTTPClient, curl, etc. RPC clients will have = this taken care for them within the SASL layer and will leverage the = REST endpoints as well. This likely implies trust requirements for the = RPC client to be able to trust the SSO server's identity cert that is = presented over SSL.=20 >>=20 >> 6. REST Client Agent Plugins: required for encapsulating the = interaction with the SSO server for the client programming models. We = may need these for many client types: e.g. Java, JavaScript, .Net, = Python, cURL etc. >>=20 >> 7. Server Side Authentication Handlers: the server side of the REST, = RPC or webui connection will need to be able to validate and verify the = incoming Hadoop tokens in order to grant or deny access to requested = resources. >>=20 >> 8. Credential/Trust Management: throughout the system - on client and = server sides - we will need to manage and provide access to PKI and = potentially shared secret artifacts in order to establish the required = trust relationships to replace the mutual authentication that would be = otherwise provided by using kerberos everywhere. >>=20 >> So, discussion points: >>=20 >> 1. Are there additional components that would be required for a = Hadoop SSO service? >> 2. Should any of the above described components be considered not = actually necessary or poorly described? >> 2. Should we create a new umbrella Jira to identify each of these as = a subtask? >> 3. Should we just continue to use 9533 for the SSO server and add = additional subtasks? >> 4. What are the natural seams of separation between these components = and any dependencies between one and another that affect priority? >>=20 >> Obviously, each component that we identify will have a jira of its = own - more than likely - so we are only trying to identify the high = level descriptions for now. >>=20 >> Can we try and drive this discussion to a close by the end of the = week? This will allow us to start breaking out into component = implementation plans. >>=20 >> thanks, >>=20 >> --larry >=20