Return-Path: X-Original-To: apmail-kafka-dev-archive@www.apache.org Delivered-To: apmail-kafka-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8A163174EC for ; Wed, 1 Oct 2014 16:44:44 +0000 (UTC) Received: (qmail 42937 invoked by uid 500); 1 Oct 2014 16:44:44 -0000 Delivered-To: apmail-kafka-dev-archive@kafka.apache.org Received: (qmail 42895 invoked by uid 500); 1 Oct 2014 16:44:44 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 42884 invoked by uid 99); 1 Oct 2014 16:44:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2014 16:44:44 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.218.41] (HELO mail-oi0-f41.google.com) (209.85.218.41) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Oct 2014 16:44:18 +0000 Received: by mail-oi0-f41.google.com with SMTP id u20so571884oif.14 for ; Wed, 01 Oct 2014 09:44:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=xhigo4TK766XYnsshewX+FpQfsOAJac6Jxf/KANgUMI=; b=apt0a//Z2Pc6pcfCVD10C2gvd6uM6NEIEUetkd674aaKxGoEmJZyp/lob7UveWgoZK q3JSSHh2x2dD3fzLmDHTlrryxONT0qaZ629iWJUdwF8iDrizLrGpC1f9jcVfMtuQbL/0 uCOKXmes15bVJahqI+QK/MqAoqk4M1iU6EZIO89UWe4oy0jfR3QTydlyAg9UVA57pKC7 KikEg/RF5S8lJvw2ftJOysVDF1zyAkaJpRnE8ErkAgt18toOfjeIawbVUPq3EMQlsuwg vEMxVZmmdIxZcrjto2SP3Deki5cW4Z1HREANkZvCA8EZUcjvzqJFV185twWFM99a0/bJ 0ITg== X-Gm-Message-State: ALoCoQk0+HXOP7xoOTEpBHklYS+acMQcG3bWVGp/QCTn6pFEPBGTBHDmrH4W5Mjr1Gy1Dbxr0DeL MIME-Version: 1.0 X-Received: by 10.182.65.65 with SMTP id v1mr1295230obs.58.1412181856400; Wed, 01 Oct 2014 09:44:16 -0700 (PDT) Received: by 10.182.158.134 with HTTP; Wed, 1 Oct 2014 09:44:16 -0700 (PDT) In-Reply-To: References: Date: Wed, 1 Oct 2014 12:44:16 -0400 Message-ID: Subject: Re: Two open issues on Kafka security From: Joe Stein To: "dev@kafka.apache.org" Content-Type: multipart/alternative; boundary=089e01538222e8cc9105045f3515 X-Virus-Checked: Checked by ClamAV on apache.org --089e01538222e8cc9105045f3515 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Jonathan, "Hadoop delegation tokens to enable MapReduce, Samza, or other frameworks running in the Hadoop environment to access Kafka" https://cwiki.apache.org/confluence/display/KAFKA/Security is on the list, yup! /******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop ********************************************/ On Wed, Oct 1, 2014 at 12:35 PM, Jonathan Creasy wrote: > This is not nearly as deep as the discussion so far, but I did want to > throw this idea out there to make sure we=C2=B9ve thought about it. > > The Kafka project should make sure that when deployed alongside a Hadoop > cluster from any major distributions that it can tie seamlessly into the > authentication and authorization used within that cluster. For example, > Apache Sentry. > > This may present additional difficulties that means a decision is made to > not do that or alternatively the Kerberos authentication and the > authorization schemes we are already working on may be sufficient. > > I=C2=B9m not sure that anything I=C2=B9ve read so far in this discussion = actually > poses a problem, but I=C2=B9m an Ops guy and being able to more easily > integrate more things, makes my life better. :) > > -Jonathan > > On 9/30/14, 11:26 PM, "Joe Stein" wrote: > > >inline > > > >On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps wrote: > > > >> Hey Joe, > >> > >> For (1) what are you thinking for the PermissionManager api? > >> > >> The way I see it, the first question we have to answer is whether it > >> is possible to make authentication and authorization independent. What > >> I mean by that is whether I can write an authorization library that > >> will work the same whether you authenticate with ssl or kerberos. > > > > > >To me that is a requirement. We can't tie them together. We have to > >provide the ability for authorization to work regardless of the > >authentication. One *VERY* important use case is level of trust in > >authentication from the authorization perpsective. e.g. I authorize > >"identity" based on the how you authenticated.... Alice is able to view > >topic X if Alice authenticated over kerberos. Bob isn't allowed to view > >topic X no matter what. Alice can authenticate over not kerberos (uses > >cases for that) and in that case Alice wouldn't see topic X. A concrete > >use case for this with Kafka would be a third party bank consuming data = to > >a broker. The service provider would have some kerberos local auth for > >that bank to-do back up that would also have access to other topics > >related > >to that banks data.... the bank itself over SSL wants a stream of events > >(some specific topic) and that banks identity only sees that topic. It = is > >important to not confuse identity, authentication and authorization. > > > > > >> If > >> so then we need to pick some subset of identity information that we > >> can extract from both and have this constitute the identity we pass > >> into the authorization interface. The original proposal had just the > >> username/subject. But maybe we should add the ip address as well as > >> that is useful. What I would prefer not to do is add everything in the > >> certificate. I think the assumption is that you are generating these > >> certificates for Kafka so you can put whatever identity info you want > >> in the Subject Alternative Name. If that is true then just using that > >> should be okay, right? > >> > > > >I think we should just push the byte[] and let the plugin deal with it. > >So, if we have a certificate object then pass that along with whatever > >other meta data (e.g. IP address of client) we can. I don't think we > >should do any parsing whatsover and let the plugin deal with that. Any > >parsing we do on the identity information for the "security object" forc= es > >us into specific implementations and I don't see any reason to-do that..= . > >If plug-ins want an "easier" time to deal with certs and parsing and bla= h > >blah blah then we can implement some way they can do this without much > >fuss.... we also need to make sure that crypto library is plugable too (= so > >we can expose an API for them to call) so that HSM can be easily dropped > >in > >without Kafka caring... so in the plugin we could provide a > >indentity.getAlternativeAttribute() and then that use case is solved (an= d > >we can use bouncy castle or whatever to parse it for them to make it > >easier).... and always give them raw bytes so they could do it themselve= s. > > > > > >> > >> -Jay > >> > >> > >> > >> > >> > >> On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein > wrote: > >> > 1) We need to support the most flexibility we can and make this > >> transparent > >> > to kafka (to use Gwen's term). Any specific implementation is going > >>to > >> > make it not work with some solution stopping people from using Kafka= . > >> That > >> > is a reality because everyone just does it slightly differently > >>enough. > >> If > >> > we have an "identity" byte structure (lets not use string because so= me > >> > security objects are bytes) this should just fall through to the > >> > implementor. For certs this is the entire x509 object (not just the > >> > certificate part as it could contain an ASN.1 timestamp) and inside > >>you > >> > parse and do what you want with it. > >> > > >> > 2) While I think there are many benefits to just the handshake > >>approach I > >> > don't think it outweighs the cons Jay expressed. a) We can't lead th= e > >> > client libraries down a new path of interacting with Kafka. By > >> > incrementally adding to the wire protocol we are directing a very > >>clear > >> and > >> > expect ted approach. We already have issues with implementation eve= n > >> with > >> > the wire protocol in place and are trying to improve that aspect of > >>the > >> > community as a whole. Lets not take a step backwards with this > >>there... > >> > also we need to not add more/different hoops to > >> > debugging/administering/monitoring kafka so taking advantage (as Jay > >> says) > >> > of built in logging (etc) is important... also for the client librar= iy > >> > developers too :) > >> > > >> > On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira > >> wrote: > >> > > >> >> Re #1: > >> >> > >> >> Since the auth_to_local is a kerberos config, its up to the admin t= o > >> >> decide how he likes the user names and set it up properly (or leave > >> >> empty) and make sure the ACLs match. Simplified names may be needed > >>if > >> >> the authorization system integrates with LDAP to get groups or > >> >> something fancy like that. > >> >> > >> >> Note that its completely transparent to Kafka - if the admin sets u= p > >> >> auth_to_local rules, we simply see a different principal name. No > >>need > >> >> to do anything different. > >> >> > >> >> Gwen > >> >> > >> >> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps > >>wrote: > >> >> > Current proposal is here: > >> >> > > >> >> > https://cwiki.apache.org/confluence/display/KAFKA/Security > >> >> > > >> >> > Here are the two open questions I am aware of: > >> >> > > >> >> > 1. We want to separate authentication and authorization. This mea= ns > >> >> > permissions will be assigned to some user-like > >>subject/entity/person > >> >> > string that is independent of the authorization mechanism. It > >>sounds > >> >> > like we agreed this could be done and we had in mind some > >>krb-specific > >> >> > mangling that Gwen knew about and I think the plan was to use > >>whatever > >> >> > the user chose to put in the Subject Alternative Name of the cert > >>for > >> >> > ssl. So in both cases these would translate to a string denoting > >>the > >> >> > entity whom we are granting permissions to in the authorization > >>layer. > >> >> > We should document these in the wiki to get feedback on them. > >> >> > > >> >> > The Hadoop approach to extraction was something like this: > >> >> > > >> >> > >> > >> > http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_man > >>ually_book/content/rpm-chap14-2-3-1.html > >> >> > > >> >> > But actually I'm not sure if just using the full kerberos > >>principal is > >> >> > so bad? I.e. having the user be jennifer@athena.mit.edu versus > just > >> >> > jennifer. Where this would make a difference would be in a case > >>where > >> >> > you wanted the same user/entity to be able to authenticate via > >> >> > different mechanisms (Hadoop auth, kerberos, ssl) and have a sing= le > >> >> > set of permissions. > >> >> > > >> >> > 2. For SASL/Kerberos we need to figure out how the communication > >> >> > between client and server will be handled to pass the > >> >> > challenge/response byte[]. I.e. > >> >> > > >> >> > > >> >> > >> > >> > http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.h > >>tml#evaluateChallenge(byte[]) > >> >> > > >> >> > >> > >> > http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.h > >>tml#evaluateResponse(byte[]) > >> >> > > >> >> > I am not super expert in this area but I will try to give my > >> >> > understanding and I'm sure someone can correct me if I am confuse= d. > >> >> > > >> >> > Unlike SSL the transmission of this is actually outside the scope > >>of > >> >> > SASL so we have to specify this. Two proposals > >> >> > > >> >> > Original Proposal: Add a new "authenticate" request/response > >> >> > > >> >> > The proposal in the original wiki was to add a new "authenticate" > >> >> > request/response to pass this information. This matches what was > >>done > >> >> > in the kerberos implementation for zookeeper. The intention is th= at > >> >> > the client would send this request immediately after establishing= a > >> >> > connection, in which case it acts much like a "handshake", howeve= r > >> >> > there is no requirement that they do so. > >> >> > > >> >> > Whether the authentication happens via SSL or via Kerberos, the > >>effect > >> >> > will just be to set the username in their session. This will > >>default > >> >> > to the "anybody" user. So in the default non-secure case we will > >>just > >> >> > be defaulting "anybody" to have full permission. So to answer the > >> >> > question about whether changing user is required or not, I don't > >>think > >> >> > it is but I think we kind of get it for free in this approach. > >> >> > > >> >> > In this approach there is no particular need or advantage to > >>having a > >> >> > separate port for kerberos I don't think. > >> >> > > >> >> > Alternate Proposal: Create a Handshake > >> >> > > >> >> > The alternative I think Michael was proposing was to create a > >> >> > handshake that would happen at connection time on connections > >>coming > >> >> > in on the SASL port. This would require a separate port for SASL > >>since > >> >> > otherwise you wouldn't be able to tell if the bytes you were > >>getting > >> >> > were for SASL or were the first request of an unauthenticated > >> >> > connection. > >> >> > > >> >> > Michael it would be good to work out the details of how this work= s. > >> >> > Are we just sending size-delimited byte arrays back and forth unt= il > >> >> > the challenge response terminates? > >> >> > > >> >> > My Take > >> >> > > >> >> > The pro I see for Michael's proposal is that it keeps the > >> >> > authentication logic more localized in the socket server. > >> >> > > >> >> > I see two cons: > >> >> > 1. Since the handshake won't go through the normal api layer it > >>won't > >> >> > go through the normal logging (e.g. request log), jmx monitoring, > >> >> > client trace token, correlation id, etc that we get for other > >> >> > requests. This could make operations a little confusing and make > >> >> > debugging a little harder since the client will be blocking on > >>network > >> >> > requests without the normal logging. > >> >> > 2. This part of the protocol will be inconsistent with the rest o= f > >>the > >> >> > Kafka protocol so it will be a little odd for client implementors > >>as > >> >> > this will effectively be a request/response that they will have t= o > >> >> > implement that will be different from all the other > >>request/responses > >> >> > they implement. > >> >> > > >> >> > In practice these two alternatives are not very different except > >>that > >> >> > in the original proposal the bytes you send are prefixed by the > >>normal > >> >> > request header fields such as the client id, correlation id, etc. > >> >> > Overall I would prefer this as I think it is a bit more consisten= t > >> >> > from the client's point of view. > >> >> > > >> >> > Cheers, > >> >> > > >> >> > -Jay > >> >> > >> > > --089e01538222e8cc9105045f3515--