Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@kafka.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <D05195E3.11F8%jonathan.creasy@turn.com>
References: 
 <CAA7ooCC0BiVmo7RHs7to_rRGzX4UQYck8XqmhBVHGxWBdx1tLg@mail.gmail.com>
	<D05195E3.11F8%jonathan.creasy@turn.com>
Date: Wed, 1 Oct 2014 12:44:16 -0400
Message-ID: 
 <CAA7ooCAN_4SmGHjAJH-3ZE-PZ8fhMNS0XR4-S_hxp++NqRsWsA@mail.gmail.com>
Subject: Re: Two open issues on Kafka security
From: Joe Stein <joe.stein@stealth.ly>
To: "dev@kafka.apache.org" <dev@kafka.apache.org>
Content-Type: multipart/alternative; boundary=089e01538222e8cc9105045f3515

--089e01538222e8cc9105045f3515
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi Jonathan,

"Hadoop delegation tokens to enable MapReduce, Samza, or other frameworks
running in the Hadoop environment to access Kafka"
https://cwiki.apache.org/confluence/display/KAFKA/Security is on the list,
yup!

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/

On Wed, Oct 1, 2014 at 12:35 PM, Jonathan Creasy <Jonathan.Creasy@turn.com>
wrote:

> This is not nearly as deep as the discussion so far, but I did want to
> throw this idea out there to make sure we=C2=B9ve thought about it.
>
> The Kafka project should make sure that when deployed alongside a Hadoop
> cluster from any major distributions that it can tie seamlessly into the
> authentication and authorization used within that cluster. For example,
> Apache Sentry.
>
> This may present additional difficulties that means a decision is made to
> not do that or alternatively the Kerberos authentication and the
> authorization schemes we are already working on may be sufficient.
>
> I=C2=B9m not sure that anything I=C2=B9ve read so far in this discussion =
actually
> poses a problem, but I=C2=B9m an Ops guy and being able to more easily
> integrate more things, makes my life better. :)
>
> -Jonathan
>
> On 9/30/14, 11:26 PM, "Joe Stein" <joe.stein@stealth.ly> wrote:
>
> >inline
> >
> >On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps <jay.kreps@gmail.com> wrote:
> >
> >> Hey Joe,
> >>
> >> For (1) what are you thinking for the PermissionManager api?
> >>
> >> The way I see it, the first question we have to answer is whether it
> >> is possible to make authentication and authorization independent. What
> >> I mean by that is whether I can write an authorization library that
> >> will work the same whether you authenticate with ssl or kerberos.
> >
> >
> >To me that is a requirement. We can't tie them together.  We have to
> >provide the ability for authorization to work regardless of the
> >authentication.  One *VERY* important use case is level of trust in
> >authentication from the authorization perpsective.  e.g. I authorize
> >"identity" based on the how you authenticated.... Alice is able to view
> >topic X if Alice authenticated over kerberos.  Bob isn't allowed to view
> >topic X no matter what. Alice can authenticate over not kerberos (uses
> >cases for that) and in that case Alice wouldn't see topic X.  A concrete
> >use case for this with Kafka would be a third party bank consuming data =
to
> >a broker.  The service provider would have some kerberos local auth for
> >that bank to-do back up that would also have access to other topics
> >related
> >to that banks data.... the bank itself over SSL wants a stream of events
> >(some specific topic) and that banks identity only sees that topic.  It =
is
> >important to not confuse identity, authentication and authorization.
> >
> >
> >> If
> >> so then we need to pick some subset of identity information that we
> >> can extract from both and have this constitute the identity we pass
> >> into the authorization interface. The original proposal had just the
> >> username/subject. But maybe we should add the ip address as well as
> >> that is useful. What I would prefer not to do is add everything in the
> >> certificate. I think the assumption is that you are generating these
> >> certificates for Kafka so you can put whatever identity info you want
> >> in the Subject Alternative Name. If that is true then just using that
> >> should be okay, right?
> >>
> >
> >I think we should just push the byte[] and let the plugin deal with it.
> >So, if we have a certificate object then pass that along with whatever
> >other meta data (e.g. IP address of client) we can.  I don't think we
> >should do any parsing whatsover and let the plugin deal with that.  Any
> >parsing we do on the identity information for the "security object" forc=
es
> >us into specific implementations and I don't see any reason to-do that..=
.
> >If plug-ins want an "easier" time to deal with certs and parsing and bla=
h
> >blah blah then we can implement some way they can do this without much
> >fuss.... we also need to make sure that crypto library is plugable too (=
so
> >we can expose an API for them to call) so that HSM can be easily dropped
> >in
> >without Kafka caring... so in the plugin we could provide a
> >indentity.getAlternativeAttribute() and then that use case is solved (an=
d
> >we can use bouncy castle or whatever to parse it for them to make it
> >easier).... and always give them raw bytes so they could do it themselve=
s.
> >
> >
> >>
> >> -Jay
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein <joe.stein@stealth.ly>
> wrote:
> >> > 1) We need to support the most flexibility we can and make this
> >> transparent
> >> > to kafka (to use Gwen's term).  Any specific implementation is going
> >>to
> >> > make it not work with some solution stopping people from using Kafka=
.
> >> That
> >> > is a reality because everyone just does it slightly differently
> >>enough.
> >> If
> >> > we have an "identity" byte structure (lets not use string because so=
me
> >> > security objects are bytes) this should just fall through to the
> >> > implementor.  For certs this is the entire x509 object (not just the
> >> > certificate part as it could contain an ASN.1 timestamp) and inside
> >>you
> >> > parse and do what you want with it.
> >> >
> >> > 2) While I think there are many benefits to just the handshake
> >>approach I
> >> > don't think it outweighs the cons Jay expressed. a) We can't lead th=
e
> >> > client libraries down a new path of interacting with Kafka.  By
> >> > incrementally adding to the wire protocol we are directing a very
> >>clear
> >> and
> >> > expect ted approach.  We already have issues with implementation eve=
n
> >> with
> >> > the wire protocol in place and are trying to improve that aspect of
> >>the
> >> > community as a whole.  Lets not take a step backwards with this
> >>there...
> >> > also we need to not add more/different hoops to
> >> > debugging/administering/monitoring kafka so taking advantage (as Jay
> >> says)
> >> > of built in logging (etc) is important... also for the client librar=
iy
> >> > developers too :)
> >> >
> >> > On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira <gshapira@cloudera.com=
>
> >> wrote:
> >> >
> >> >> Re #1:
> >> >>
> >> >> Since the auth_to_local is a kerberos config, its up to the admin t=
o
> >> >> decide how he likes the user names and set it up properly (or leave
> >> >> empty) and make sure the ACLs match. Simplified names may be needed
> >>if
> >> >> the authorization system integrates with LDAP to get groups or
> >> >> something fancy like that.
> >> >>
> >> >> Note that its completely transparent to Kafka - if the admin sets u=
p
> >> >> auth_to_local rules, we simply see a different principal name. No
> >>need
> >> >> to do anything different.
> >> >>
> >> >> Gwen
> >> >>
> >> >> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps <jay.kreps@gmail.com>
> >>wrote:
> >> >> > Current proposal is here:
> >> >> >
> >> >> > https://cwiki.apache.org/confluence/display/KAFKA/Security
> >> >> >
> >> >> > Here are the two open questions I am aware of:
> >> >> >
> >> >> > 1. We want to separate authentication and authorization. This mea=
ns
> >> >> > permissions will be assigned to some user-like
> >>subject/entity/person
> >> >> > string that is independent of the authorization mechanism. It
> >>sounds
> >> >> > like we agreed this could be done and we had in mind some
> >>krb-specific
> >> >> > mangling that Gwen knew about and I think the plan was to use
> >>whatever
> >> >> > the user chose to put in the Subject Alternative Name of the cert
> >>for
> >> >> > ssl. So in both cases these would translate to a string denoting
> >>the
> >> >> > entity whom we are granting permissions to in the authorization
> >>layer.
> >> >> > We should document these in the wiki to get feedback on them.
> >> >> >
> >> >> > The Hadoop approach to extraction was something like this:
> >> >> >
> >> >>
> >>
> >>
> http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_man
> >>ually_book/content/rpm-chap14-2-3-1.html
> >> >> >
> >> >> > But actually I'm not sure if just using the full kerberos
> >>principal is
> >> >> > so bad? I.e. having the user be jennifer@athena.mit.edu versus
> just
> >> >> > jennifer. Where this would make a difference would be in a case
> >>where
> >> >> > you wanted the same user/entity to be able to authenticate via
> >> >> > different mechanisms (Hadoop auth, kerberos, ssl) and have a sing=
le
> >> >> > set of permissions.
> >> >> >
> >> >> > 2. For SASL/Kerberos we need to figure out how the communication
> >> >> > between client and server will be handled to pass the
> >> >> > challenge/response byte[]. I.e.
> >> >> >
> >> >> >
> >> >>
> >>
> >>
> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.h
> >>tml#evaluateChallenge(byte[])
> >> >> >
> >> >>
> >>
> >>
> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.h
> >>tml#evaluateResponse(byte[])
> >> >> >
> >> >> > I am not super expert in this area but I will try to give my
> >> >> > understanding and I'm sure someone can correct me if I am confuse=
d.
> >> >> >
> >> >> > Unlike SSL the transmission of this is actually outside the scope
> >>of
> >> >> > SASL so we have to specify this. Two proposals
> >> >> >
> >> >> > Original Proposal: Add a new "authenticate" request/response
> >> >> >
> >> >> > The proposal in the original wiki was to add a new "authenticate"
> >> >> > request/response to pass this information. This matches what was
> >>done
> >> >> > in the kerberos implementation for zookeeper. The intention is th=
at
> >> >> > the client would send this request immediately after establishing=
 a
> >> >> > connection, in which case it acts much like a "handshake", howeve=
r
> >> >> > there is no requirement that they do so.
> >> >> >
> >> >> > Whether the authentication happens via SSL or via Kerberos, the
> >>effect
> >> >> > will just be to set the username in their session. This will
> >>default
> >> >> > to the "anybody" user. So in the default non-secure case we will
> >>just
> >> >> > be defaulting "anybody" to have full permission. So to answer the
> >> >> > question about whether changing user is required or not, I don't
> >>think
> >> >> > it is but I think we kind of get it for free in this approach.
> >> >> >
> >> >> > In this approach there is no particular need or advantage to
> >>having a
> >> >> > separate port for kerberos I don't think.
> >> >> >
> >> >> > Alternate Proposal: Create a Handshake
> >> >> >
> >> >> > The alternative I think Michael was proposing was to create a
> >> >> > handshake that would happen at connection time on connections
> >>coming
> >> >> > in on the SASL port. This would require a separate port for SASL
> >>since
> >> >> > otherwise you wouldn't be able to tell if the bytes you were
> >>getting
> >> >> > were for SASL or were the first request of an unauthenticated
> >> >> > connection.
> >> >> >
> >> >> > Michael it would be good to work out the details of how this work=
s.
> >> >> > Are we just sending size-delimited byte arrays back and forth unt=
il
> >> >> > the challenge response terminates?
> >> >> >
> >> >> > My Take
> >> >> >
> >> >> > The pro I see for Michael's proposal is that it keeps the
> >> >> > authentication logic more localized in the socket server.
> >> >> >
> >> >> > I see two cons:
> >> >> > 1. Since the handshake won't go through the normal api layer it
> >>won't
> >> >> > go through the normal logging (e.g. request log), jmx monitoring,
> >> >> > client trace token, correlation id, etc that we get for other
> >> >> > requests. This could make operations a little confusing and make
> >> >> > debugging a little harder since the client will be blocking on
> >>network
> >> >> > requests without the normal logging.
> >> >> > 2. This part of the protocol will be inconsistent with the rest o=
f
> >>the
> >> >> > Kafka protocol so it will be a little odd for client implementors
> >>as
> >> >> > this will effectively be a request/response that they will have t=
o
> >> >> > implement that will be different from all the other
> >>request/responses
> >> >> > they implement.
> >> >> >
> >> >> > In practice these two alternatives are not very different except
> >>that
> >> >> > in the original proposal the bytes you send are prefixed by the
> >>normal
> >> >> > request header fields such as the client id, correlation id, etc.
> >> >> > Overall I would prefer this as I think it is a bit more consisten=
t
> >> >> > from the client's point of view.
> >> >> >
> >> >> > Cheers,
> >> >> >
> >> >> > -Jay
> >> >>
> >>
>
>

--089e01538222e8cc9105045f3515--