kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Kreps <jay.kr...@gmail.com>
Subject Re: Two open issues on Kafka security
Date Thu, 02 Oct 2014 16:27:23 GMT
Hey Michael,

Cool. Yeah I think in practice there isn't a huge difference since
Kafka requests are just length prefixed packets the only difference is
the presence or absence of the header fields. Having them there will
make life simpler and more consistent for client implementations since
this will just be one more request type they can chose to implement
and will look like all the other request types.

So let's do that.

-Jay

On Thu, Oct 2, 2014 at 9:01 AM, Michael Herstine
<mherstine@linkedin.com.invalid> wrote:
> Hi Jay,
>
> Yup― in both SASL & (non-blocking) SSL the runtime libs provide an
> “engine” abstraction that just takes in & produces buffers of byte
> containing the authentication messages. The application is responsible for
> transmitting them… somehow. I was picturing a simple length-prefixed
> packet.
>
> Thanks for the pointer to the ZK code― I spent yesterday morning reading
> the server side & see how it’s being done (interesting side note: SASL is
> only used for Kerberos― other authentication schemes go through a
> different mechanism).
>
> I’m all for going with the original proposal & not introducing a second
> (albeit trivial) protocol… I was laboring under the impression that we
> wanted to avoid adding new request/response types, that’s all.
>
> On 10/1/14, 9:52 PM, "Jay Kreps" <jay.kreps@gmail.com> wrote:
>
>>Here is the client side in ZK:
>>https://svn.apache.org/repos/asf/zookeeper/trunk/src/java/main/org/apache/
>>zookeeper/client/ZooKeeperSaslClient.java
>>
>>Note how they have a special Zookeeper request API that is used to
>>send the SASL bytes (e.g. see ZooKeeperSaslClient.sendSaslPacket).
>>
>>This API follows the same protocol and rpc mechanism all their other
>>request/response types follow but it just has a simple byte[] entry
>>for the SASL token in both the request and response.
>>
>>-Jay
>>
>>On Wed, Oct 1, 2014 at 9:46 PM, Jay Kreps <jay.kreps@gmail.com> wrote:
>>> Hey Michael,
>>>
>>> WRT question 2, I think for SASL you do need the mechanism information
>>> but what I was talking about was the challenge/response byte[] that is
>>> sent back and forth from the client to the server. My understanding is
>>> that SASL gives you an api for the client and server to use to produce
>>> these byte[]'s but doesn't actually specify any way of exchanging them
>>> (that is protocol specific). I could be wrong here since my knowledge
>>> of this stuff is pretty weak. But according to my understanding you
>>> must be imagining some protocol for exchanging challenge/response
>>> information. This protocol would have to be clearly documented for
>>> client implementors. What is that protocol?
>>>
>>> -Jay
>>>
>>> On Wed, Oct 1, 2014 at 2:36 PM, Michael Herstine
>>> <mherstine@linkedin.com.invalid> wrote:
>>>> Regarding question #1, I’m not sure I follow you, Joe: you’re
>>>>proposing (I
>>>> think) that the API take a byte[], but what will be in that array? A
>>>> serialized certificate if the client authenticated via SSL and the
>>>> principal name (perhaps normalized) if the client authenticated via
>>>> Kerberos?
>>>>
>>>> Regarding question #2, I think I was unclear in the meeting yesterday:
>>>>I
>>>> was proposing a separate port for each authentication method (including
>>>> none). That is, if a client wants no authentication, then they would
>>>> connect to port N on the broker. If they wanted to talk over SSL, then
>>>> they connect to port N+1 (say). Kerberos: N+2. This would remove the
>>>>need
>>>> for a new request, since the authentication type would be implicit in
>>>>the
>>>> port on which the client connected (and it was my understanding that it
>>>> was desirable to not introduce any new messages).
>>>>
>>>> Perhaps the confusion comes from the fact, correctly pointed out by
>>>>Jay,
>>>> that when you want to use SASL on a single port, there does of course
>>>>need
>>>> to be a way for the incoming client to signal which mechanism it wants
>>>>to
>>>> use, and that’s out of scope of the SASL spec. I didn’t see there
>>>>being a
>>>> desire to add new SASL mechanisms going forward, but perhaps I was
>>>> incorrect?
>>>>
>>>> In any event, I’d like to suggest we keep the “open” or “no auth”
port
>>>> separate, both to make it easy for admins to force the use of security
>>>>(by
>>>> shutting down that port) and to avoid downgrade attacks (where an
>>>>attacker
>>>> intercepts the opening packet from a client requesting security &
>>>>alters
>>>> it to request none).
>>>>
>>>> I’ll update the Wiki with my notes from yesterday’s meeting this
>>>>afternoon.
>>>>
>>>> Thanks,
>>>>
>>>> On 10/1/14, 9:35 AM, "Jonathan Creasy" <Jonathan.Creasy@turn.com>
>>>>wrote:
>>>>
>>>>>This is not nearly as deep as the discussion so far, but I did want to
>>>>>throw this idea out there to make sure we¹ve thought about it.
>>>>>
>>>>>The Kafka project should make sure that when deployed alongside a
>>>>>Hadoop
>>>>>cluster from any major distributions that it can tie seamlessly into
>>>>>the
>>>>>authentication and authorization used within that cluster. For example,
>>>>>Apache Sentry.
>>>>>
>>>>>This may present additional difficulties that means a decision is made
>>>>>to
>>>>>not do that or alternatively the Kerberos authentication and the
>>>>>authorization schemes we are already working on may be sufficient.
>>>>>
>>>>>I¹m not sure that anything I¹ve read so far in this discussion actually
>>>>>poses a problem, but I¹m an Ops guy and being able to more easily
>>>>>integrate more things, makes my life better. :)
>>>>>
>>>>>-Jonathan
>>>>>
>>>>>On 9/30/14, 11:26 PM, "Joe Stein" <joe.stein@stealth.ly> wrote:
>>>>>
>>>>>>inline
>>>>>>
>>>>>>On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps <jay.kreps@gmail.com>
>>>>>>wrote:
>>>>>>
>>>>>>> Hey Joe,
>>>>>>>
>>>>>>> For (1) what are you thinking for the PermissionManager api?
>>>>>>>
>>>>>>> The way I see it, the first question we have to answer is whether
it
>>>>>>> is possible to make authentication and authorization independent.
>>>>>>>What
>>>>>>> I mean by that is whether I can write an authorization library
that
>>>>>>> will work the same whether you authenticate with ssl or kerberos.
>>>>>>
>>>>>>
>>>>>>To me that is a requirement. We can't tie them together.  We have
to
>>>>>>provide the ability for authorization to work regardless of the
>>>>>>authentication.  One *VERY* important use case is level of trust in
>>>>>>authentication from the authorization perpsective.  e.g. I authorize
>>>>>>"identity" based on the how you authenticated.... Alice is able to
>>>>>>view
>>>>>>topic X if Alice authenticated over kerberos.  Bob isn't allowed to
>>>>>>view
>>>>>>topic X no matter what. Alice can authenticate over not kerberos (uses
>>>>>>cases for that) and in that case Alice wouldn't see topic X.  A
>>>>>>concrete
>>>>>>use case for this with Kafka would be a third party bank consuming
>>>>>>data
>>>>>>to
>>>>>>a broker.  The service provider would have some kerberos local auth
>>>>>>for
>>>>>>that bank to-do back up that would also have access to other topics
>>>>>>related
>>>>>>to that banks data.... the bank itself over SSL wants a stream of
>>>>>>events
>>>>>>(some specific topic) and that banks identity only sees that topic.
>>>>>>It
>>>>>>is
>>>>>>important to not confuse identity, authentication and authorization.
>>>>>>
>>>>>>
>>>>>>> If
>>>>>>> so then we need to pick some subset of identity information that
we
>>>>>>> can extract from both and have this constitute the identity we
pass
>>>>>>> into the authorization interface. The original proposal had just
the
>>>>>>> username/subject. But maybe we should add the ip address as well
as
>>>>>>> that is useful. What I would prefer not to do is add everything
in
>>>>>>>the
>>>>>>> certificate. I think the assumption is that you are generating
these
>>>>>>> certificates for Kafka so you can put whatever identity info
you
>>>>>>>want
>>>>>>> in the Subject Alternative Name. If that is true then just using
>>>>>>>that
>>>>>>> should be okay, right?
>>>>>>>
>>>>>>
>>>>>>I think we should just push the byte[] and let the plugin deal with
>>>>>>it.
>>>>>>So, if we have a certificate object then pass that along with whatever
>>>>>>other meta data (e.g. IP address of client) we can.  I don't think
we
>>>>>>should do any parsing whatsover and let the plugin deal with that.
>>>>>>Any
>>>>>>parsing we do on the identity information for the "security object"
>>>>>>forces
>>>>>>us into specific implementations and I don't see any reason to-do
>>>>>>that...
>>>>>>If plug-ins want an "easier" time to deal with certs and parsing and
>>>>>>blah
>>>>>>blah blah then we can implement some way they can do this without
much
>>>>>>fuss.... we also need to make sure that crypto library is plugable
too
>>>>>>(so
>>>>>>we can expose an API for them to call) so that HSM can be easily
>>>>>>dropped
>>>>>>in
>>>>>>without Kafka caring... so in the plugin we could provide a
>>>>>>indentity.getAlternativeAttribute() and then that use case is solved
>>>>>>(and
>>>>>>we can use bouncy castle or whatever to parse it for them to make
it
>>>>>>easier).... and always give them raw bytes so they could do it
>>>>>>themselves.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> -Jay
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein <joe.stein@stealth.ly>
>>>>>>>wrote:
>>>>>>> > 1) We need to support the most flexibility we can and make
this
>>>>>>> transparent
>>>>>>> > to kafka (to use Gwen's term).  Any specific implementation
is
>>>>>>>going
>>>>>>>to
>>>>>>> > make it not work with some solution stopping people from
using
>>>>>>>Kafka.
>>>>>>> That
>>>>>>> > is a reality because everyone just does it slightly differently
>>>>>>>enough.
>>>>>>> If
>>>>>>> > we have an "identity" byte structure (lets not use string
because
>>>>>>>some
>>>>>>> > security objects are bytes) this should just fall through
to the
>>>>>>> > implementor.  For certs this is the entire x509 object (not
just
>>>>>>>the
>>>>>>> > certificate part as it could contain an ASN.1 timestamp)
and
>>>>>>>inside
>>>>>>>you
>>>>>>> > parse and do what you want with it.
>>>>>>> >
>>>>>>> > 2) While I think there are many benefits to just the handshake
>>>>>>>approach I
>>>>>>> > don't think it outweighs the cons Jay expressed. a) We can't
lead
>>>>>>>the
>>>>>>> > client libraries down a new path of interacting with Kafka.
 By
>>>>>>> > incrementally adding to the wire protocol we are directing
a very
>>>>>>>clear
>>>>>>> and
>>>>>>> > expect ted approach.  We already have issues with implementation
>>>>>>>even
>>>>>>> with
>>>>>>> > the wire protocol in place and are trying to improve that
aspect
>>>>>>>of
>>>>>>>the
>>>>>>> > community as a whole.  Lets not take a step backwards with
this
>>>>>>>there...
>>>>>>> > also we need to not add more/different hoops to
>>>>>>> > debugging/administering/monitoring kafka so taking advantage
(as
>>>>>>>Jay
>>>>>>> says)
>>>>>>> > of built in logging (etc) is important... also for the client
>>>>>>>librariy
>>>>>>> > developers too :)
>>>>>>> >
>>>>>>> > On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira
>>>>>>><gshapira@cloudera.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> >> Re #1:
>>>>>>> >>
>>>>>>> >> Since the auth_to_local is a kerberos config, its up
to the
>>>>>>>admin to
>>>>>>> >> decide how he likes the user names and set it up properly
(or
>>>>>>>leave
>>>>>>> >> empty) and make sure the ACLs match. Simplified names
may be
>>>>>>>needed
>>>>>>>if
>>>>>>> >> the authorization system integrates with LDAP to get
groups or
>>>>>>> >> something fancy like that.
>>>>>>> >>
>>>>>>> >> Note that its completely transparent to Kafka - if the
admin
>>>>>>>sets up
>>>>>>> >> auth_to_local rules, we simply see a different principal
name. No
>>>>>>>need
>>>>>>> >> to do anything different.
>>>>>>> >>
>>>>>>> >> Gwen
>>>>>>> >>
>>>>>>> >> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps <jay.kreps@gmail.com>
>>>>>>>wrote:
>>>>>>> >> > Current proposal is here:
>>>>>>> >> >
>>>>>>> >> > https://cwiki.apache.org/confluence/display/KAFKA/Security
>>>>>>> >> >
>>>>>>> >> > Here are the two open questions I am aware of:
>>>>>>> >> >
>>>>>>> >> > 1. We want to separate authentication and authorization.
This
>>>>>>>means
>>>>>>> >> > permissions will be assigned to some user-like
>>>>>>>subject/entity/person
>>>>>>> >> > string that is independent of the authorization
mechanism. It
>>>>>>>sounds
>>>>>>> >> > like we agreed this could be done and we had in
mind some
>>>>>>>krb-specific
>>>>>>> >> > mangling that Gwen knew about and I think the plan
was to use
>>>>>>>whatever
>>>>>>> >> > the user chose to put in the Subject Alternative
Name of the
>>>>>>>cert
>>>>>>>for
>>>>>>> >> > ssl. So in both cases these would translate to
a string
>>>>>>>denoting
>>>>>>>the
>>>>>>> >> > entity whom we are granting permissions to in the
authorization
>>>>>>>layer.
>>>>>>> >> > We should document these in the wiki to get feedback
on them.
>>>>>>> >> >
>>>>>>> >> > The Hadoop approach to extraction was something
like this:
>>>>>>> >> >
>>>>>>> >>
>>>>>>>
>>>>>>>http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing
>>>>>>>_ma
>>>>>>>n
>>>>>>>ually_book/content/rpm-chap14-2-3-1.html
>>>>>>> >> >
>>>>>>> >> > But actually I'm not sure if just using the full
kerberos
>>>>>>>principal is
>>>>>>> >> > so bad? I.e. having the user be jennifer@athena.mit.edu
versus
>>>>>>>just
>>>>>>> >> > jennifer. Where this would make a difference would
be in a case
>>>>>>>where
>>>>>>> >> > you wanted the same user/entity to be able to authenticate
via
>>>>>>> >> > different mechanisms (Hadoop auth, kerberos, ssl)
and have a
>>>>>>>single
>>>>>>> >> > set of permissions.
>>>>>>> >> >
>>>>>>> >> > 2. For SASL/Kerberos we need to figure out how
the
>>>>>>>communication
>>>>>>> >> > between client and server will be handled to pass
the
>>>>>>> >> > challenge/response byte[]. I.e.
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >>
>>>>>>>
>>>>>>>http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClie
>>>>>>>nt.
>>>>>>>h
>>>>>>>tml#evaluateChallenge(byte[])
>>>>>>> >> >
>>>>>>> >>
>>>>>>>
>>>>>>>http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServ
>>>>>>>er.
>>>>>>>h
>>>>>>>tml#evaluateResponse(byte[])
>>>>>>> >> >
>>>>>>> >> > I am not super expert in this area but I will try
to give my
>>>>>>> >> > understanding and I'm sure someone can correct
me if I am
>>>>>>>confused.
>>>>>>> >> >
>>>>>>> >> > Unlike SSL the transmission of this is actually
outside the
>>>>>>>scope
>>>>>>>of
>>>>>>> >> > SASL so we have to specify this. Two proposals
>>>>>>> >> >
>>>>>>> >> > Original Proposal: Add a new "authenticate" request/response
>>>>>>> >> >
>>>>>>> >> > The proposal in the original wiki was to add a
new
>>>>>>>"authenticate"
>>>>>>> >> > request/response to pass this information. This
matches what
>>>>>>>was
>>>>>>>done
>>>>>>> >> > in the kerberos implementation for zookeeper. The
intention is
>>>>>>>that
>>>>>>> >> > the client would send this request immediately
after
>>>>>>>establishing
>>>>>>>a
>>>>>>> >> > connection, in which case it acts much like a "handshake",
>>>>>>>however
>>>>>>> >> > there is no requirement that they do so.
>>>>>>> >> >
>>>>>>> >> > Whether the authentication happens via SSL or via
Kerberos, the
>>>>>>>effect
>>>>>>> >> > will just be to set the username in their session.
This will
>>>>>>>default
>>>>>>> >> > to the "anybody" user. So in the default non-secure
case we
>>>>>>>will
>>>>>>>just
>>>>>>> >> > be defaulting "anybody" to have full permission.
So to answer
>>>>>>>the
>>>>>>> >> > question about whether changing user is required
or not, I
>>>>>>>don't
>>>>>>>think
>>>>>>> >> > it is but I think we kind of get it for free in
this approach.
>>>>>>> >> >
>>>>>>> >> > In this approach there is no particular need or
advantage to
>>>>>>>having a
>>>>>>> >> > separate port for kerberos I don't think.
>>>>>>> >> >
>>>>>>> >> > Alternate Proposal: Create a Handshake
>>>>>>> >> >
>>>>>>> >> > The alternative I think Michael was proposing was
to create a
>>>>>>> >> > handshake that would happen at connection time
on connections
>>>>>>>coming
>>>>>>> >> > in on the SASL port. This would require a separate
port for
>>>>>>>SASL
>>>>>>>since
>>>>>>> >> > otherwise you wouldn't be able to tell if the bytes
you were
>>>>>>>getting
>>>>>>> >> > were for SASL or were the first request of an unauthenticated
>>>>>>> >> > connection.
>>>>>>> >> >
>>>>>>> >> > Michael it would be good to work out the details
of how this
>>>>>>>works.
>>>>>>> >> > Are we just sending size-delimited byte arrays
back and forth
>>>>>>>until
>>>>>>> >> > the challenge response terminates?
>>>>>>> >> >
>>>>>>> >> > My Take
>>>>>>> >> >
>>>>>>> >> > The pro I see for Michael's proposal is that it
keeps the
>>>>>>> >> > authentication logic more localized in the socket
server.
>>>>>>> >> >
>>>>>>> >> > I see two cons:
>>>>>>> >> > 1. Since the handshake won't go through the normal
api layer it
>>>>>>>won't
>>>>>>> >> > go through the normal logging (e.g. request log),
jmx
>>>>>>>monitoring,
>>>>>>> >> > client trace token, correlation id, etc that we
get for other
>>>>>>> >> > requests. This could make operations a little confusing
and
>>>>>>>make
>>>>>>> >> > debugging a little harder since the client will
be blocking on
>>>>>>>network
>>>>>>> >> > requests without the normal logging.
>>>>>>> >> > 2. This part of the protocol will be inconsistent
with the
>>>>>>>rest of
>>>>>>>the
>>>>>>> >> > Kafka protocol so it will be a little odd for client
>>>>>>>implementors
>>>>>>>as
>>>>>>> >> > this will effectively be a request/response that
they will
>>>>>>>have to
>>>>>>> >> > implement that will be different from all the other
>>>>>>>request/responses
>>>>>>> >> > they implement.
>>>>>>> >> >
>>>>>>> >> > In practice these two alternatives are not very
different
>>>>>>>except
>>>>>>>that
>>>>>>> >> > in the original proposal the bytes you send are
prefixed by the
>>>>>>>normal
>>>>>>> >> > request header fields such as the client id, correlation
id,
>>>>>>>etc.
>>>>>>> >> > Overall I would prefer this as I think it is a
bit more
>>>>>>>consistent
>>>>>>> >> > from the client's point of view.
>>>>>>> >> >
>>>>>>> >> > Cheers,
>>>>>>> >> >
>>>>>>> >> > -Jay
>>>>>>> >>
>>>>>>>
>>>>>
>>>>
>

Mime
View raw message