kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Don Bosco Durai <bo...@apache.org>
Subject Re: Two open issues on Kafka security
Date Thu, 02 Oct 2014 17:54:27 GMT
I agree, username+IP would be sufficient. I assume, when authentication is turned off or doesn’t
exist, but authorization plugin is enabled, then username would be empty or passed as “nobody”,
but with valid IP (if available).

> The name “context" is probably not the right one. The idea is to have an
> object into which we can easily add additional properties in the future
> to support additional authorization libraries without breaking backward
> compatibility with existing ones.
+1. Makes the design scalable.



> ----- Original message -----
> From: Jarek Jarcec Cecho <jarcec@apache.org>
> To: dev@kafka.apache.org
> Subject: Re: Two open issues on Kafka security
> Date: Thu, 2 Oct 2014 08:33:45 -0700
> Thanks for getting back Jay!
> For the interface - Looking at Sentry and other authorization libraries
> in the Hadoop eco system it seems that “username” is primarily use to
> perform authorization these days. And then IP for auditing. Hence I feel
> that username+IP would be sufficient, at least for now. However I would
> assume that in the future we might need more then just those two, so
> what about defining the API in a way that we can easily extend in the
> future, something like?
> authorize(Context, Entity, Action), where
> * Action - is the action that user is trying to do (read to topic, read
> from topic, create topic, …)
> * Entity - given entity that user is trying to perform that action on
> (topic, …)
> * Context - container with user/session information - user name, IP
> address or perhaps entire certificate as was suggested early on the
> email thread.
> The name “context" is probably not the right one. The idea is to have an
> object into which we can easily add additional properties in the future
> to support additional authorization libraries without breaking backward
> compatibility with existing ones.
> The hierarchy is interesting topic - I’m not familiar enough with Kafka
> internals so I can’t really talk about how much more complex it would
> be. I can speak about Sentry and the way we designed security model for
> Hive and Search where introducing the hierarchy wasn’t complex at all
> and actually lead to a cleaner model. The biggest user visible benefit
> is that you don’t have to deal with special rules such as “give READ
> privilege to user jarcec to ALL topics”. If you have a singleton parent
> entity (service or whatever name seems more accurate), you can easily
> say that you have the READ access on this root entity and then all
> topics will simply inherit that.
> Jarcec
> On Oct 1, 2014, at 9:33 PM, Jay Kreps <jay.kreps@gmail.com> wrote:
>> Hey Jarek,
>> I agree with the importance of separating authentication and
>> authorization. The question is what concept of identity is sufficient
>> to pass through to the authorization layer? Just a "user name"? Or
>> perhaps you also need the ip the request originated from? Whatever
>> these would be it would be nice to enumerate them so the authz portion
>> can be written in a way that ignores the authn part.
>> So if no one else proposes anything different maybe we can just say
>> user name + ip?
>> With respect to hierarchy, it would be nice to have topic hierarchies
>> but we don't have them now so seems overkill to try to think them
>> through wrt security now, right?
>> -Jay
>> On Wed, Oct 1, 2014 at 1:13 PM, Jarek Jarcec Cecho <jarcec@apache.org> wrote:
>>> I’m following the security proposal wiki page [1] and this discussion and I
would like to jump in with few points if I might :)  Let me start by saying that I like the
material and the discussion here, good work!
>>> I was part of the team who originally designed and worked on Sentry and I wanted
to share few to see how it will resonate with people.  My first and probably biggest point
would be to separate authorization and authentication as two separate systems. I believe that
Jao has already stressed that in the email thread, but I wanted to reiterate on that point.
In my experience users don’t care that much about how the user has been authenticated if
they trust that mechanism, what they care more about is that the authorization model is consistent
and behaves the same way. E.g. if I configured that user jarcec can write into topic “logs”,
he should be able to do that no matter where the connection came from - whether he has been
authorized from Kerberos as he is directly exploring the data from his computer, he is authorized
through delegation token because he is running map reduce jobs calculating statistics or he
is  authorized through SSL certificated because … (well I’m missing good example here,
but you’re probably following my point).
>>> I’ve also noticed that we are planning to have no hierarchy in the authz object
model per the wiki [1] with the reasoning that Kafka do not supports topic hierarchy. I see
that point, but at the same time it got me thinking - are we sure that Kafka will never have
hierarchic topics? Seems as a nice feature that might be usable for some use cases and something
that we might want to add in the future. But regardless of that I would suggest to introduce
a hierarchy anyway, even though if it would be just two levels. In sentry (for Hive) we’ve
introduced concept of “Service” where all the databases are children of the service. In
Kafka I would imagine that we would have “service” and “topics” as the children. Having
this is much easier to model general privileges where you need to grant access to all topics
- you will just grant access to the entire service and all topics will get “inherited”.
>>> I’m wondering what are other people thoughts?
>>> Jarcec
>>> Links:
>>> 1: https://cwiki.apache.org/confluence/display/KAFKA/Security
>>> On Oct 1, 2014, at 9:44 AM, Joe Stein <joe.stein@stealth.ly> wrote:
>>>> Hi Jonathan,
>>>> "Hadoop delegation tokens to enable MapReduce, Samza, or other frameworks
>>>> running in the Hadoop environment to access Kafka"
>>>> https://cwiki.apache.org/confluence/display/KAFKA/Security is on the list,
>>>> yup!
>>>> /*******************************************
>>>> Joe Stein
>>>> Founder, Principal Consultant
>>>> Big Data Open Source Security LLC
>>>> http://www.stealth.ly
>>>> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>>>> ********************************************/
>>>> On Wed, Oct 1, 2014 at 12:35 PM, Jonathan Creasy <Jonathan.Creasy@turn.com>
>>>> wrote:
>>>>> This is not nearly as deep as the discussion so far, but I did want to
>>>>> throw this idea out there to make sure we¹ve thought about it.
>>>>> The Kafka project should make sure that when deployed alongside a Hadoop
>>>>> cluster from any major distributions that it can tie seamlessly into
>>>>> authentication and authorization used within that cluster. For example,
>>>>> Apache Sentry.
>>>>> This may present additional difficulties that means a decision is made
>>>>> not do that or alternatively the Kerberos authentication and the
>>>>> authorization schemes we are already working on may be sufficient.
>>>>> I¹m not sure that anything I¹ve read so far in this discussion actually
>>>>> poses a problem, but I¹m an Ops guy and being able to more easily
>>>>> integrate more things, makes my life better. :)
>>>>> -Jonathan
>>>>> On 9/30/14, 11:26 PM, "Joe Stein" <joe.stein@stealth.ly> wrote:
>>>>>> inline
>>>>>> On Tue, Sep 30, 2014 at 11:58 PM, Jay Kreps <jay.kreps@gmail.com>
>>>>>>> Hey Joe,
>>>>>>> For (1) what are you thinking for the PermissionManager api?
>>>>>>> The way I see it, the first question we have to answer is whether
>>>>>>> is possible to make authentication and authorization independent.
>>>>>>> I mean by that is whether I can write an authorization library
>>>>>>> will work the same whether you authenticate with ssl or kerberos.
>>>>>> To me that is a requirement. We can't tie them together.  We have
>>>>>> provide the ability for authorization to work regardless of the
>>>>>> authentication.  One *VERY* important use case is level of trust
>>>>>> authentication from the authorization perpsective.  e.g. I authorize
>>>>>> "identity" based on the how you authenticated.... Alice is able to
>>>>>> topic X if Alice authenticated over kerberos.  Bob isn't allowed
to view
>>>>>> topic X no matter what. Alice can authenticate over not kerberos
>>>>>> cases for that) and in that case Alice wouldn't see topic X.  A concrete
>>>>>> use case for this with Kafka would be a third party bank consuming
data to
>>>>>> a broker.  The service provider would have some kerberos local auth
>>>>>> that bank to-do back up that would also have access to other topics
>>>>>> related
>>>>>> to that banks data.... the bank itself over SSL wants a stream of
>>>>>> (some specific topic) and that banks identity only sees that topic.
 It is
>>>>>> important to not confuse identity, authentication and authorization.
>>>>>>> If
>>>>>>> so then we need to pick some subset of identity information that
>>>>>>> can extract from both and have this constitute the identity we
>>>>>>> into the authorization interface. The original proposal had just
>>>>>>> username/subject. But maybe we should add the ip address as well
>>>>>>> that is useful. What I would prefer not to do is add everything
in the
>>>>>>> certificate. I think the assumption is that you are generating
>>>>>>> certificates for Kafka so you can put whatever identity info
you want
>>>>>>> in the Subject Alternative Name. If that is true then just using
>>>>>>> should be okay, right?
>>>>>> I think we should just push the byte[] and let the plugin deal with
>>>>>> So, if we have a certificate object then pass that along with whatever
>>>>>> other meta data (e.g. IP address of client) we can.  I don't think
>>>>>> should do any parsing whatsover and let the plugin deal with that.
>>>>>> parsing we do on the identity information for the "security object"
>>>>>> us into specific implementations and I don't see any reason to-do
>>>>>> If plug-ins want an "easier" time to deal with certs and parsing
and blah
>>>>>> blah blah then we can implement some way they can do this without
>>>>>> fuss.... we also need to make sure that crypto library is plugable
too (so
>>>>>> we can expose an API for them to call) so that HSM can be easily
>>>>>> in
>>>>>> without Kafka caring... so in the plugin we could provide a
>>>>>> indentity.getAlternativeAttribute() and then that use case is solved
>>>>>> we can use bouncy castle or whatever to parse it for them to make
>>>>>> easier).... and always give them raw bytes so they could do it themselves.
>>>>>>> -Jay
>>>>>>> On Tue, Sep 30, 2014 at 4:09 PM, Joe Stein <joe.stein@stealth.ly>
>>>>> wrote:
>>>>>>>> 1) We need to support the most flexibility we can and make
>>>>>>> transparent
>>>>>>>> to kafka (to use Gwen's term).  Any specific implementation
is going
>>>>>>> to
>>>>>>>> make it not work with some solution stopping people from
using Kafka.
>>>>>>> That
>>>>>>>> is a reality because everyone just does it slightly differently
>>>>>>> enough.
>>>>>>> If
>>>>>>>> we have an "identity" byte structure (lets not use string
because some
>>>>>>>> security objects are bytes) this should just fall through
to the
>>>>>>>> implementor.  For certs this is the entire x509 object (not
just the
>>>>>>>> certificate part as it could contain an ASN.1 timestamp)
and inside
>>>>>>> you
>>>>>>>> parse and do what you want with it.
>>>>>>>> 2) While I think there are many benefits to just the handshake
>>>>>>> approach I
>>>>>>>> don't think it outweighs the cons Jay expressed. a) We can't
lead the
>>>>>>>> client libraries down a new path of interacting with Kafka.
>>>>>>>> incrementally adding to the wire protocol we are directing
a very
>>>>>>> clear
>>>>>>> and
>>>>>>>> expect ted approach.  We already have issues with implementation
>>>>>>> with
>>>>>>>> the wire protocol in place and are trying to improve that
aspect of
>>>>>>> the
>>>>>>>> community as a whole.  Lets not take a step backwards with
>>>>>>> there...
>>>>>>>> also we need to not add more/different hoops to
>>>>>>>> debugging/administering/monitoring kafka so taking advantage
(as Jay
>>>>>>> says)
>>>>>>>> of built in logging (etc) is important... also for the client
>>>>>>>> developers too :)
>>>>>>>> On Tue, Sep 30, 2014 at 6:44 PM, Gwen Shapira <gshapira@cloudera.com>
>>>>>>> wrote:
>>>>>>>>> Re #1:
>>>>>>>>> Since the auth_to_local is a kerberos config, its up
to the admin to
>>>>>>>>> decide how he likes the user names and set it up properly
(or leave
>>>>>>>>> empty) and make sure the ACLs match. Simplified names
may be needed
>>>>>>> if
>>>>>>>>> the authorization system integrates with LDAP to get
groups or
>>>>>>>>> something fancy like that.
>>>>>>>>> Note that its completely transparent to Kafka - if the
admin sets up
>>>>>>>>> auth_to_local rules, we simply see a different principal
name. No
>>>>>>> need
>>>>>>>>> to do anything different.
>>>>>>>>> Gwen
>>>>>>>>> On Tue, Sep 30, 2014 at 3:31 PM, Jay Kreps <jay.kreps@gmail.com>
>>>>>>> wrote:
>>>>>>>>>> Current proposal is here:
>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/Security
>>>>>>>>>> Here are the two open questions I am aware of:
>>>>>>>>>> 1. We want to separate authentication and authorization.
This means
>>>>>>>>>> permissions will be assigned to some user-like
>>>>>>> subject/entity/person
>>>>>>>>>> string that is independent of the authorization mechanism.
>>>>>>> sounds
>>>>>>>>>> like we agreed this could be done and we had in mind
>>>>>>> krb-specific
>>>>>>>>>> mangling that Gwen knew about and I think the plan
was to use
>>>>>>> whatever
>>>>>>>>>> the user chose to put in the Subject Alternative
Name of the cert
>>>>>>> for
>>>>>>>>>> ssl. So in both cases these would translate to a
string denoting
>>>>>>> the
>>>>>>>>>> entity whom we are granting permissions to in the
>>>>>>> layer.
>>>>>>>>>> We should document these in the wiki to get feedback
on them.
>>>>>>>>>> The Hadoop approach to extraction was something like
>>>>> http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_man
>>>>>>> ually_book/content/rpm-chap14-2-3-1.html
>>>>>>>>>> But actually I'm not sure if just using the full
>>>>>>> principal is
>>>>>>>>>> so bad? I.e. having the user be jennifer@athena.mit.edu
>>>>> just
>>>>>>>>>> jennifer. Where this would make a difference would
be in a case
>>>>>>> where
>>>>>>>>>> you wanted the same user/entity to be able to authenticate
>>>>>>>>>> different mechanisms (Hadoop auth, kerberos, ssl)
and have a single
>>>>>>>>>> set of permissions.
>>>>>>>>>> 2. For SASL/Kerberos we need to figure out how the
>>>>>>>>>> between client and server will be handled to pass
>>>>>>>>>> challenge/response byte[]. I.e.
>>>>> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslClient.h
>>>>>>> tml#evaluateChallenge(byte[])
>>>>> http://docs.oracle.com/javase/7/docs/api/javax/security/sasl/SaslServer.h
>>>>>>> tml#evaluateResponse(byte[])
>>>>>>>>>> I am not super expert in this area but I will try
to give my
>>>>>>>>>> understanding and I'm sure someone can correct me
if I am confused.
>>>>>>>>>> Unlike SSL the transmission of this is actually outside
the scope
>>>>>>> of
>>>>>>>>>> SASL so we have to specify this. Two proposals
>>>>>>>>>> Original Proposal: Add a new "authenticate" request/response
>>>>>>>>>> The proposal in the original wiki was to add a new
>>>>>>>>>> request/response to pass this information. This matches
what was
>>>>>>> done
>>>>>>>>>> in the kerberos implementation for zookeeper. The
intention is that
>>>>>>>>>> the client would send this request immediately after
establishing a
>>>>>>>>>> connection, in which case it acts much like a "handshake",
>>>>>>>>>> there is no requirement that they do so.
>>>>>>>>>> Whether the authentication happens via SSL or via
Kerberos, the
>>>>>>> effect
>>>>>>>>>> will just be to set the username in their session.
This will
>>>>>>> default
>>>>>>>>>> to the "anybody" user. So in the default non-secure
case we will
>>>>>>> just
>>>>>>>>>> be defaulting "anybody" to have full permission.
So to answer the
>>>>>>>>>> question about whether changing user is required
or not, I don't
>>>>>>> think
>>>>>>>>>> it is but I think we kind of get it for free in this
>>>>>>>>>> In this approach there is no particular need or advantage
>>>>>>> having a
>>>>>>>>>> separate port for kerberos I don't think.
>>>>>>>>>> Alternate Proposal: Create a Handshake
>>>>>>>>>> The alternative I think Michael was proposing was
to create a
>>>>>>>>>> handshake that would happen at connection time on
>>>>>>> coming
>>>>>>>>>> in on the SASL port. This would require a separate
port for SASL
>>>>>>> since
>>>>>>>>>> otherwise you wouldn't be able to tell if the bytes
you were
>>>>>>> getting
>>>>>>>>>> were for SASL or were the first request of an unauthenticated
>>>>>>>>>> connection.
>>>>>>>>>> Michael it would be good to work out the details
of how this works.
>>>>>>>>>> Are we just sending size-delimited byte arrays back
and forth until
>>>>>>>>>> the challenge response terminates?
>>>>>>>>>> My Take
>>>>>>>>>> The pro I see for Michael's proposal is that it keeps
>>>>>>>>>> authentication logic more localized in the socket
>>>>>>>>>> I see two cons:
>>>>>>>>>> 1. Since the handshake won't go through the normal
api layer it
>>>>>>> won't
>>>>>>>>>> go through the normal logging (e.g. request log),
jmx monitoring,
>>>>>>>>>> client trace token, correlation id, etc that we get
for other
>>>>>>>>>> requests. This could make operations a little confusing
and make
>>>>>>>>>> debugging a little harder since the client will be
blocking on
>>>>>>> network
>>>>>>>>>> requests without the normal logging.
>>>>>>>>>> 2. This part of the protocol will be inconsistent
with the rest of
>>>>>>> the
>>>>>>>>>> Kafka protocol so it will be a little odd for client
>>>>>>> as
>>>>>>>>>> this will effectively be a request/response that
they will have to
>>>>>>>>>> implement that will be different from all the other
>>>>>>> request/responses
>>>>>>>>>> they implement.
>>>>>>>>>> In practice these two alternatives are not very different
>>>>>>> that
>>>>>>>>>> in the original proposal the bytes you send are prefixed
by the
>>>>>>> normal
>>>>>>>>>> request header fields such as the client id, correlation
id, etc.
>>>>>>>>>> Overall I would prefer this as I think it is a bit
more consistent
>>>>>>>>>> from the client's point of view.
>>>>>>>>>> Cheers,
>>>>>>>>>> -Jay

View raw message