Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@kafka.apache.org
Received-SPF: pass (athena.apache.org: domain of jay.kreps@gmail.com
 designates 74.125.82.51 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAA7ooCBdqA0FpGaq6jS6+6Qo-OWDNBp9WnsH2TyF17yj1D0ABw@mail.gmail.com>
References: 
 <CAOeJiJgssWUVSpMityGf8rRAOo9wd2dNtpKWrt8Ga8-CGi9P5g@mail.gmail.com>
	<CAHBV8WcdUua_pTjunJK-z6MW3b=n=hpF16oEHkXxkERd1X1TTw@mail.gmail.com>
	<CAA7ooCBdqA0FpGaq6jS6+6Qo-OWDNBp9WnsH2TyF17yj1D0ABw@mail.gmail.com>
Date: Tue, 30 Sep 2014 20:58:45 -0700
Message-ID: 
 <CAOeJiJhif3MTenKSaAQ9SXFvKxX7qfdDv3e4mdZsizDKyC2sKg@mail.gmail.com>
Subject: Re: Two open issues on Kafka security
From: Jay Kreps <jay.kreps@gmail.com>
To: "dev@kafka.apache.org" <dev@kafka.apache.org>
Content-Type: text/plain; charset=UTF-8

Hey Joe,

For (1) what are you thinking for the PermissionManager api?

The way I see it, the first question we have to answer is whether it
is possible to make authentication and authorization independent. What
I mean by that is whether I can write an authorization library that
will work the same whether you authenticate with ssl or kerberos. If
so then we need to pick some subset of identity information that we
can extract from both and have this constitute the identity we pass
into the authorization interface. The original proposal had just the
username/subject. But maybe we should add the ip address as well as
that is useful. What I would prefer not to do is add everything in the
certificate. I think the assumption is that you are generating these
certificates for Kafka so you can put whatever identity info you want
in the Subject Alternative Name. If that is true then just using that
should be okay, right?

-Jay


On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein <joe.stein@stealth.ly> wrote:
> 1) We need to support the most flexibility we can and make this transparent
> to kafka (to use Gwen's term).  Any specific implementation is going to
> make it not work with some solution stopping people from using Kafka.  That
> is a reality because everyone just does it slightly differently enough. If
> we have an "identity" byte structure (lets not use string because some
> security objects are bytes) this should just fall through to the
> implementor.  For certs this is the entire x509 object (not just the
> certificate part as it could contain an ASN.1 timestamp) and inside you
> parse and do what you want with it.
>
> 2) While I think there are many benefits to just the handshake approach I
> don't think it outweighs the cons Jay expressed. a) We can't lead the
> client libraries down a new path of interacting with Kafka.  By
> incrementally adding to the wire protocol we are directing a very clear and
> expect ted approach.  We already have issues with implementation even with
> the wire protocol in place and are trying to improve that aspect of the
> community as a whole.  Lets not take a step backwards with this there...
> also we need to not add more/different hoops to
> debugging/administering/monitoring kafka so taking advantage (as Jay says)
> of built in logging (etc) is important... also for the client librariy
> developers too :)
>
> On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira <gshapira@cloudera.com> wrote:
>
>> Re #1:
>>
>> Since the auth_to_local is a kerberos config, its up to the admin to
>> decide how he likes the user names and set it up properly (or leave
>> empty) and make sure the ACLs match. Simplified names may be needed if
>> the authorization system integrates with LDAP to get groups or
>> something fancy like that.
>>
>> Note that its completely transparent to Kafka - if the admin sets up
>> auth_to_local rules, we simply see a different principal name. No need
>> to do anything different.
>>
>> Gwen
>>
>> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps <jay.kreps@gmail.com> wrote:
>> > Current proposal is here:
>> >
>> > https://cwiki.apache.org/confluence/display/KAFKA/Security
>> >
>> > Here are the two open questions I am aware of:
>> >
>> > 1. We want to separate authentication and authorization. This means
>> > permissions will be assigned to some user-like subject/entity/person
>> > string that is independent of the authorization mechanism. It sounds
>> > like we agreed this could be done and we had in mind some krb-specific
>> > mangling that Gwen knew about and I think the plan was to use whatever
>> > the user chose to put in the Subject Alternative Name of the cert for
>> > ssl. So in both cases these would translate to a string denoting the
>> > entity whom we are granting permissions to in the authorization layer.
>> > We should document these in the wiki to get feedback on them.
>> >
>> > The Hadoop approach to extraction was something like this:
>> >
>> http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_manually_book/content/rpm-chap14-2-3-1.html
>> >
>> > But actually I'm not sure if just using the full kerberos principal is
>> > so bad? I.e. having the user be jennifer@athena.mit.edu versus just
>> > jennifer. Where this would make a difference would be in a case where
>> > you wanted the same user/entity to be able to authenticate via
>> > different mechanisms (Hadoop auth, kerberos, ssl) and have a single
>> > set of permissions.
>> >
>> > 2. For SASL/Kerberos we need to figure out how the communication
>> > between client and server will be handled to pass the
>> > challenge/response byte[]. I.e.
>> >
>> >
>> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.html#evaluateChallenge(byte[])
>> >
>> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.html#evaluateResponse(byte[])
>> >
>> > I am not super expert in this area but I will try to give my
>> > understanding and I'm sure someone can correct me if I am confused.
>> >
>> > Unlike SSL the transmission of this is actually outside the scope of
>> > SASL so we have to specify this. Two proposals
>> >
>> > Original Proposal: Add a new "authenticate" request/response
>> >
>> > The proposal in the original wiki was to add a new "authenticate"
>> > request/response to pass this information. This matches what was done
>> > in the kerberos implementation for zookeeper. The intention is that
>> > the client would send this request immediately after establishing a
>> > connection, in which case it acts much like a "handshake", however
>> > there is no requirement that they do so.
>> >
>> > Whether the authentication happens via SSL or via Kerberos, the effect
>> > will just be to set the username in their session. This will default
>> > to the "anybody" user. So in the default non-secure case we will just
>> > be defaulting "anybody" to have full permission. So to answer the
>> > question about whether changing user is required or not, I don't think
>> > it is but I think we kind of get it for free in this approach.
>> >
>> > In this approach there is no particular need or advantage to having a
>> > separate port for kerberos I don't think.
>> >
>> > Alternate Proposal: Create a Handshake
>> >
>> > The alternative I think Michael was proposing was to create a
>> > handshake that would happen at connection time on connections coming
>> > in on the SASL port. This would require a separate port for SASL since
>> > otherwise you wouldn't be able to tell if the bytes you were getting
>> > were for SASL or were the first request of an unauthenticated
>> > connection.
>> >
>> > Michael it would be good to work out the details of how this works.
>> > Are we just sending size-delimited byte arrays back and forth until
>> > the challenge response terminates?
>> >
>> > My Take
>> >
>> > The pro I see for Michael's proposal is that it keeps the
>> > authentication logic more localized in the socket server.
>> >
>> > I see two cons:
>> > 1. Since the handshake won't go through the normal api layer it won't
>> > go through the normal logging (e.g. request log), jmx monitoring,
>> > client trace token, correlation id, etc that we get for other
>> > requests. This could make operations a little confusing and make
>> > debugging a little harder since the client will be blocking on network
>> > requests without the normal logging.
>> > 2. This part of the protocol will be inconsistent with the rest of the
>> > Kafka protocol so it will be a little odd for client implementors as
>> > this will effectively be a request/response that they will have to
>> > implement that will be different from all the other request/responses
>> > they implement.
>> >
>> > In practice these two alternatives are not very different except that
>> > in the original proposal the bytes you send are prefixed by the normal
>> > request header fields such as the client id, correlation id, etc.
>> > Overall I would prefer this as I think it is a bit more consistent
>> > from the client's point of view.
>> >
>> > Cheers,
>> >
>> > -Jay
>>