Return-Path: X-Original-To: apmail-kafka-dev-archive@www.apache.org Delivered-To: apmail-kafka-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9DBAE17974 for ; Thu, 2 Oct 2014 17:54:30 +0000 (UTC) Received: (qmail 16498 invoked by uid 500); 2 Oct 2014 17:54:30 -0000 Delivered-To: apmail-kafka-dev-archive@kafka.apache.org Received: (qmail 16448 invoked by uid 500); 2 Oct 2014 17:54:30 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 16433 invoked by uid 99); 2 Oct 2014 17:54:30 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Oct 2014 17:54:30 +0000 Received: from [10.0.1.6] (99-99-38-3.lightspeed.sntcca.sbcglobal.net [99.99.38.3]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 4CAAB1A0042 for ; Thu, 2 Oct 2014 17:54:20 +0000 (UTC) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Two open issues on Kafka security From: Don Bosco Durai In-Reply-To: <1412271888.2138031.174461285.55703E8D@webmail.messagingengine.com> Date: Thu, 2 Oct 2014 10:54:27 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <6B6B17A6-E5B9-413D-A6BE-E57CC1B1EBAC@apache.org> References: <1412271888.2138031.174461285.55703E8D@webmail.messagingengine.com> To: dev@kafka.apache.org X-Mailer: Apple Mail (2.1878.6) I agree, username+IP would be sufficient. I assume, when authentication = is turned off or doesn=92t exist, but authorization plugin is enabled, = then username would be empty or passed as =93nobody=94, but with valid = IP (if available). > The name =93context" is probably not the right one. The idea is to = have an > object into which we can easily add additional properties in the = future > to support additional authorization libraries without breaking = backward > compatibility with existing ones. +1. Makes the design scalable. Thanks Bosco >=20 >=20 > ----- Original message ----- > From: Jarek Jarcec Cecho > To: dev@kafka.apache.org > Subject: Re: Two open issues on Kafka security > Date: Thu, 2 Oct 2014 08:33:45 -0700 >=20 > Thanks for getting back Jay! >=20 > For the interface - Looking at Sentry and other authorization = libraries > in the Hadoop eco system it seems that =93username=94 is primarily use = to > perform authorization these days. And then IP for auditing. Hence I = feel > that username+IP would be sufficient, at least for now. However I = would > assume that in the future we might need more then just those two, so > what about defining the API in a way that we can easily extend in the > future, something like? >=20 > authorize(Context, Entity, Action), where >=20 > * Action - is the action that user is trying to do (read to topic, = read > from topic, create topic, =85) > * Entity - given entity that user is trying to perform that action on > (topic, =85) > * Context - container with user/session information - user name, IP > address or perhaps entire certificate as was suggested early on the > email thread. >=20 > The name =93context" is probably not the right one. The idea is to = have an > object into which we can easily add additional properties in the = future > to support additional authorization libraries without breaking = backward > compatibility with existing ones. >=20 > The hierarchy is interesting topic - I=92m not familiar enough with = Kafka > internals so I can=92t really talk about how much more complex it = would > be. I can speak about Sentry and the way we designed security model = for > Hive and Search where introducing the hierarchy wasn=92t complex at = all > and actually lead to a cleaner model. The biggest user visible benefit > is that you don=92t have to deal with special rules such as =93give = READ > privilege to user jarcec to ALL topics=94. If you have a singleton = parent > entity (service or whatever name seems more accurate), you can easily > say that you have the READ access on this root entity and then all > topics will simply inherit that=10. >=20 > Jarcec >=20 > On Oct 1, 2014, at 9:33 PM, Jay Kreps wrote: >=20 >> Hey Jarek, >>=20 >> I agree with the importance of separating authentication and >> authorization. The question is what concept of identity is sufficient >> to pass through to the authorization layer? Just a "user name"? Or >> perhaps you also need the ip the request originated from? Whatever >> these would be it would be nice to enumerate them so the authz = portion >> can be written in a way that ignores the authn part. >>=20 >> So if no one else proposes anything different maybe we can just say >> user name + ip? >>=20 >> With respect to hierarchy, it would be nice to have topic hierarchies >> but we don't have them now so seems overkill to try to think them >> through wrt security now, right? >>=20 >> -Jay >>=20 >>=20 >>=20 >> On Wed, Oct 1, 2014 at 1:13 PM, Jarek Jarcec Cecho = wrote: >>> I=92m following the security proposal wiki page [1] and this = discussion and I would like to jump in with few points if I might :) = Let me start by saying that I like the material and the discussion here, = good work! >>>=20 >>> I was part of the team who originally designed and worked on Sentry = and I wanted to share few to see how it will resonate with people. My = first and probably biggest point would be to separate authorization and = authentication as two separate systems. I believe that Jao has already = stressed that in the email thread, but I wanted to reiterate on that = point. In my experience users don=92t care that much about how the user = has been authenticated if they trust that mechanism, what they care more = about is that the authorization model is consistent and behaves the same = way. E.g. if I configured that user jarcec can write into topic =93logs=94= , he should be able to do that no matter where the connection came from = - whether he has been authorized from Kerberos as he is directly = exploring the data from his computer, he is authorized through = delegation token because he is running map reduce jobs calculating = statistics or he is authorized through SSL certificated because =85 = (well I=92m missing good example here, but you=92re probably following = my point). >>>=20 >>> I=92ve also noticed that we are planning to have no hierarchy in the = authz object model per the wiki [1] with the reasoning that Kafka do not = supports topic hierarchy. I see that point, but at the same time it got = me thinking - are we sure that Kafka will never have hierarchic topics? = Seems as a nice feature that might be usable for some use cases and = something that we might want to add in the future. But regardless of = that I would suggest to introduce a hierarchy anyway, even though if it = would be just two levels. In sentry (for Hive) we=92ve introduced = concept of =93Service=94 where all the databases are children of the = service. In Kafka I would imagine that we would have =93service=94 and = =93topics=94 as the children. Having this is much easier to model = general privileges where you need to grant access to all topics - you = will just grant access to the entire service and all topics will get = =93inherited=94. >>>=20 >>> I=92m wondering what are other people thoughts? >>>=20 >>> Jarcec >>>=20 >>> Links: >>> 1: https://cwiki.apache.org/confluence/display/KAFKA/Security >>>=20 >>> On Oct 1, 2014, at 9:44 AM, Joe Stein wrote: >>>=20 >>>> Hi Jonathan, >>>>=20 >>>> "Hadoop delegation tokens to enable MapReduce, Samza, or other = frameworks >>>> running in the Hadoop environment to access Kafka" >>>> https://cwiki.apache.org/confluence/display/KAFKA/Security is on = the list, >>>> yup! >>>>=20 >>>> /******************************************* >>>> Joe Stein >>>> Founder, Principal Consultant >>>> Big Data Open Source Security LLC >>>> http://www.stealth.ly >>>> Twitter: @allthingshadoop >>>> ********************************************/ >>>>=20 >>>> On Wed, Oct 1, 2014 at 12:35 PM, Jonathan Creasy = >>>> wrote: >>>>=20 >>>>> This is not nearly as deep as the discussion so far, but I did = want to >>>>> throw this idea out there to make sure we=B9ve thought about it. >>>>>=20 >>>>> The Kafka project should make sure that when deployed alongside a = Hadoop >>>>> cluster from any major distributions that it can tie seamlessly = into the >>>>> authentication and authorization used within that cluster. For = example, >>>>> Apache Sentry. >>>>>=20 >>>>> This may present additional difficulties that means a decision is = made to >>>>> not do that or alternatively the Kerberos authentication and the >>>>> authorization schemes we are already working on may be sufficient. >>>>>=20 >>>>> I=B9m not sure that anything I=B9ve read so far in this discussion = actually >>>>> poses a problem, but I=B9m an Ops guy and being able to more = easily >>>>> integrate more things, makes my life better. :) >>>>>=20 >>>>> -Jonathan >>>>>=20 >>>>> On 9/30/14, 11:26 PM, "Joe Stein" wrote: >>>>>=20 >>>>>> inline >>>>>>=20 >>>>>> On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps = wrote: >>>>>>=20 >>>>>>> Hey Joe, >>>>>>>=20 >>>>>>> For (1) what are you thinking for the PermissionManager api? >>>>>>>=20 >>>>>>> The way I see it, the first question we have to answer is = whether it >>>>>>> is possible to make authentication and authorization = independent. What >>>>>>> I mean by that is whether I can write an authorization library = that >>>>>>> will work the same whether you authenticate with ssl or = kerberos. >>>>>>=20 >>>>>>=20 >>>>>> To me that is a requirement. We can't tie them together. We have = to >>>>>> provide the ability for authorization to work regardless of the >>>>>> authentication. One *VERY* important use case is level of trust = in >>>>>> authentication from the authorization perpsective. e.g. I = authorize >>>>>> "identity" based on the how you authenticated.... Alice is able = to view >>>>>> topic X if Alice authenticated over kerberos. Bob isn't allowed = to view >>>>>> topic X no matter what. Alice can authenticate over not kerberos = (uses >>>>>> cases for that) and in that case Alice wouldn't see topic X. A = concrete >>>>>> use case for this with Kafka would be a third party bank = consuming data to >>>>>> a broker. The service provider would have some kerberos local = auth for >>>>>> that bank to-do back up that would also have access to other = topics >>>>>> related >>>>>> to that banks data.... the bank itself over SSL wants a stream of = events >>>>>> (some specific topic) and that banks identity only sees that = topic. It is >>>>>> important to not confuse identity, authentication and = authorization. >>>>>>=20 >>>>>>=20 >>>>>>> If >>>>>>> so then we need to pick some subset of identity information that = we >>>>>>> can extract from both and have this constitute the identity we = pass >>>>>>> into the authorization interface. The original proposal had just = the >>>>>>> username/subject. But maybe we should add the ip address as well = as >>>>>>> that is useful. What I would prefer not to do is add everything = in the >>>>>>> certificate. I think the assumption is that you are generating = these >>>>>>> certificates for Kafka so you can put whatever identity info you = want >>>>>>> in the Subject Alternative Name. If that is true then just using = that >>>>>>> should be okay, right? >>>>>>>=20 >>>>>>=20 >>>>>> I think we should just push the byte[] and let the plugin deal = with it. >>>>>> So, if we have a certificate object then pass that along with = whatever >>>>>> other meta data (e.g. IP address of client) we can. I don't = think we >>>>>> should do any parsing whatsover and let the plugin deal with = that. Any >>>>>> parsing we do on the identity information for the "security = object" forces >>>>>> us into specific implementations and I don't see any reason to-do = that... >>>>>> If plug-ins want an "easier" time to deal with certs and parsing = and blah >>>>>> blah blah then we can implement some way they can do this without = much >>>>>> fuss.... we also need to make sure that crypto library is = plugable too (so >>>>>> we can expose an API for them to call) so that HSM can be easily = dropped >>>>>> in >>>>>> without Kafka caring... so in the plugin we could provide a >>>>>> indentity.getAlternativeAttribute() and then that use case is = solved (and >>>>>> we can use bouncy castle or whatever to parse it for them to make = it >>>>>> easier).... and always give them raw bytes so they could do it = themselves. >>>>>>=20 >>>>>>=20 >>>>>>>=20 >>>>>>> -Jay >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>> On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein = >>>>> wrote: >>>>>>>> 1) We need to support the most flexibility we can and make this >>>>>>> transparent >>>>>>>> to kafka (to use Gwen's term). Any specific implementation is = going >>>>>>> to >>>>>>>> make it not work with some solution stopping people from using = Kafka. >>>>>>> That >>>>>>>> is a reality because everyone just does it slightly differently >>>>>>> enough. >>>>>>> If >>>>>>>> we have an "identity" byte structure (lets not use string = because some >>>>>>>> security objects are bytes) this should just fall through to = the >>>>>>>> implementor. For certs this is the entire x509 object (not = just the >>>>>>>> certificate part as it could contain an ASN.1 timestamp) and = inside >>>>>>> you >>>>>>>> parse and do what you want with it. >>>>>>>>=20 >>>>>>>> 2) While I think there are many benefits to just the handshake >>>>>>> approach I >>>>>>>> don't think it outweighs the cons Jay expressed. a) We can't = lead the >>>>>>>> client libraries down a new path of interacting with Kafka. By >>>>>>>> incrementally adding to the wire protocol we are directing a = very >>>>>>> clear >>>>>>> and >>>>>>>> expect ted approach. We already have issues with = implementation even >>>>>>> with >>>>>>>> the wire protocol in place and are trying to improve that = aspect of >>>>>>> the >>>>>>>> community as a whole. Lets not take a step backwards with this >>>>>>> there... >>>>>>>> also we need to not add more/different hoops to >>>>>>>> debugging/administering/monitoring kafka so taking advantage = (as Jay >>>>>>> says) >>>>>>>> of built in logging (etc) is important... also for the client = librariy >>>>>>>> developers too :) >>>>>>>>=20 >>>>>>>> On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira = >>>>>>> wrote: >>>>>>>>=20 >>>>>>>>> Re #1: >>>>>>>>>=20 >>>>>>>>> Since the auth_to_local is a kerberos config, its up to the = admin to >>>>>>>>> decide how he likes the user names and set it up properly (or = leave >>>>>>>>> empty) and make sure the ACLs match. Simplified names may be = needed >>>>>>> if >>>>>>>>> the authorization system integrates with LDAP to get groups or >>>>>>>>> something fancy like that. >>>>>>>>>=20 >>>>>>>>> Note that its completely transparent to Kafka - if the admin = sets up >>>>>>>>> auth_to_local rules, we simply see a different principal name. = No >>>>>>> need >>>>>>>>> to do anything different. >>>>>>>>>=20 >>>>>>>>> Gwen >>>>>>>>>=20 >>>>>>>>> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps = >>>>>>> wrote: >>>>>>>>>> Current proposal is here: >>>>>>>>>>=20 >>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Security >>>>>>>>>>=20 >>>>>>>>>> Here are the two open questions I am aware of: >>>>>>>>>>=20 >>>>>>>>>> 1. We want to separate authentication and authorization. This = means >>>>>>>>>> permissions will be assigned to some user-like >>>>>>> subject/entity/person >>>>>>>>>> string that is independent of the authorization mechanism. It >>>>>>> sounds >>>>>>>>>> like we agreed this could be done and we had in mind some >>>>>>> krb-specific >>>>>>>>>> mangling that Gwen knew about and I think the plan was to use >>>>>>> whatever >>>>>>>>>> the user chose to put in the Subject Alternative Name of the = cert >>>>>>> for >>>>>>>>>> ssl. So in both cases these would translate to a string = denoting >>>>>>> the >>>>>>>>>> entity whom we are granting permissions to in the = authorization >>>>>>> layer. >>>>>>>>>> We should document these in the wiki to get feedback on them. >>>>>>>>>>=20 >>>>>>>>>> The Hadoop approach to extraction was something like this: >>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>> = http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_man >>>>>>> ually_book/content/rpm-chap14-2-3-1.html >>>>>>>>>>=20 >>>>>>>>>> But actually I'm not sure if just using the full kerberos >>>>>>> principal is >>>>>>>>>> so bad? I.e. having the user be jennifer@athena.mit.edu = versus >>>>> just >>>>>>>>>> jennifer. Where this would make a difference would be in a = case >>>>>>> where >>>>>>>>>> you wanted the same user/entity to be able to authenticate = via >>>>>>>>>> different mechanisms (Hadoop auth, kerberos, ssl) and have a = single >>>>>>>>>> set of permissions. >>>>>>>>>>=20 >>>>>>>>>> 2. For SASL/Kerberos we need to figure out how the = communication >>>>>>>>>> between client and server will be handled to pass the >>>>>>>>>> challenge/response byte[]. I.e. >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>> = http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.h >>>>>>> tml#evaluateChallenge(byte[]) >>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>> = http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.h >>>>>>> tml#evaluateResponse(byte[]) >>>>>>>>>>=20 >>>>>>>>>> I am not super expert in this area but I will try to give my >>>>>>>>>> understanding and I'm sure someone can correct me if I am = confused. >>>>>>>>>>=20 >>>>>>>>>> Unlike SSL the transmission of this is actually outside the = scope >>>>>>> of >>>>>>>>>> SASL so we have to specify this. Two proposals >>>>>>>>>>=20 >>>>>>>>>> Original Proposal: Add a new "authenticate" request/response >>>>>>>>>>=20 >>>>>>>>>> The proposal in the original wiki was to add a new = "authenticate" >>>>>>>>>> request/response to pass this information. This matches what = was >>>>>>> done >>>>>>>>>> in the kerberos implementation for zookeeper. The intention = is that >>>>>>>>>> the client would send this request immediately after = establishing a >>>>>>>>>> connection, in which case it acts much like a "handshake", = however >>>>>>>>>> there is no requirement that they do so. >>>>>>>>>>=20 >>>>>>>>>> Whether the authentication happens via SSL or via Kerberos, = the >>>>>>> effect >>>>>>>>>> will just be to set the username in their session. This will >>>>>>> default >>>>>>>>>> to the "anybody" user. So in the default non-secure case we = will >>>>>>> just >>>>>>>>>> be defaulting "anybody" to have full permission. So to answer = the >>>>>>>>>> question about whether changing user is required or not, I = don't >>>>>>> think >>>>>>>>>> it is but I think we kind of get it for free in this = approach. >>>>>>>>>>=20 >>>>>>>>>> In this approach there is no particular need or advantage to >>>>>>> having a >>>>>>>>>> separate port for kerberos I don't think. >>>>>>>>>>=20 >>>>>>>>>> Alternate Proposal: Create a Handshake >>>>>>>>>>=20 >>>>>>>>>> The alternative I think Michael was proposing was to create a >>>>>>>>>> handshake that would happen at connection time on connections >>>>>>> coming >>>>>>>>>> in on the SASL port. This would require a separate port for = SASL >>>>>>> since >>>>>>>>>> otherwise you wouldn't be able to tell if the bytes you were >>>>>>> getting >>>>>>>>>> were for SASL or were the first request of an unauthenticated >>>>>>>>>> connection. >>>>>>>>>>=20 >>>>>>>>>> Michael it would be good to work out the details of how this = works. >>>>>>>>>> Are we just sending size-delimited byte arrays back and forth = until >>>>>>>>>> the challenge response terminates? >>>>>>>>>>=20 >>>>>>>>>> My Take >>>>>>>>>>=20 >>>>>>>>>> The pro I see for Michael's proposal is that it keeps the >>>>>>>>>> authentication logic more localized in the socket server. >>>>>>>>>>=20 >>>>>>>>>> I see two cons: >>>>>>>>>> 1. Since the handshake won't go through the normal api layer = it >>>>>>> won't >>>>>>>>>> go through the normal logging (e.g. request log), jmx = monitoring, >>>>>>>>>> client trace token, correlation id, etc that we get for other >>>>>>>>>> requests. This could make operations a little confusing and = make >>>>>>>>>> debugging a little harder since the client will be blocking = on >>>>>>> network >>>>>>>>>> requests without the normal logging. >>>>>>>>>> 2. This part of the protocol will be inconsistent with the = rest of >>>>>>> the >>>>>>>>>> Kafka protocol so it will be a little odd for client = implementors >>>>>>> as >>>>>>>>>> this will effectively be a request/response that they will = have to >>>>>>>>>> implement that will be different from all the other >>>>>>> request/responses >>>>>>>>>> they implement. >>>>>>>>>>=20 >>>>>>>>>> In practice these two alternatives are not very different = except >>>>>>> that >>>>>>>>>> in the original proposal the bytes you send are prefixed by = the >>>>>>> normal >>>>>>>>>> request header fields such as the client id, correlation id, = etc. >>>>>>>>>> Overall I would prefer this as I think it is a bit more = consistent >>>>>>>>>> from the client's point of view. >>>>>>>>>>=20 >>>>>>>>>> Cheers, >>>>>>>>>>=20 >>>>>>>>>> -Jay >>>>>>>>>=20 >>>>>>>=20 >>>>>=20 >>>>>=20 >>>=20 >=20