kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajini Sivaram <rajinisiva...@gmail.com>
Subject Re: [DISCUSS] KIP 368: Allow SASL Connections to Periodically Re-Authenticate
Date Thu, 06 Sep 2018 09:17:44 GMT
Hi Ron,

Disconnections on the broker-side: I think we should do disconnections as a
separate KIP and PR as you originally intended. But that one could be done
separately without requiring KIP-368 as a pre-req. As a simpler
implementation and one that can be used without KIP-368 in some cases, we
could commit that first since this one may take longer.

I wasn't suggesting that we do move re-authentication to NetworkClient, I
was thinking of the lower layers, handling authentication and
reauthentication at the same layer in a similar way. Let me look into the
code and come up with a more detailed explanation to avoid confusion.

I am not too concerned about the imports in KafkaChannel. As you said, we
can improve that if we need to. KafkaChannels are aware of
network/authentication states and if that becomes a bit more complex, I
don't think it would matter so much. My concern is about changes like
https://github.com/apache/kafka/pull/5582/files#diff-987fef43991384a3ebec5fb55e53b577
in ControllerChannelManager. Those classes shouldn't have deal with SASL or
reauthentication. Anyway, I will get back with more detail on what I had in
mind since that may not be viable at all.



On Thu, Sep 6, 2018 at 1:44 AM, Ron Dagostino <rndgstn@gmail.com> wrote:

> I just thought of another alternative if the imports are the concern.
> KafkaChannel could expose the fact that it can create an additional
> Authenticator instance on the side (what I referred to as
> notYetAuthenticatedAuthenticator in the PR) and it could let
> kafka.server.KafkaApis drive the whole thing -- create the instance on the
> side, clean it up if it fails, move it into place and close the old one if
> it succeeds, etc.  Then KafkaChannel wouldn't need to import anything new
> -- it just exposes its Authenticator and the ability to perform the swap
> upon success, etc.
>
> Ron
>
> On Wed, Sep 5, 2018 at 5:01 PM Ron Dagostino <rndgstn@gmail.com> wrote:
>
> > Hi again, Rajini,  I realized a couple of potential concerns with using
> > the TransportLayer directly during re-authentication.   First, in the
> > blocking I/O use case, the owner of the NetworkClient instance calls
> > NetworkClientUtils.sendAndReceive() to send requests.   This method
> > assumes the caller has exclusive access to the NetworkClient, so it does
> > not check to see if the node is ready; it just sends the request and
> > repeatedly calls poll() until the response arrives.  if we were to take
> > the connection temporarily offline that method currently has no mechanism
> > for checking for such a state before it sends the request; we could add
> it,
> > but we would have to put in some kind of sleep loop to keep checking for
> it
> > to come back "online" before it could send.  Adding such a sleep loop
> could
> > be done, of course, but it doesn't sound ideal.
> >
> > A similar sleep loop situation exists in the async use case.  The owner
> > repeatedly calls poll() in a loop, but if the connection to the node is
> > temporarily offline then the poll() method would have to enter a sleep
> > loop until either the connection comes back online or the timeout elapses
> > (whichever comes first).
> >
> > I don't know if there is an aversion to adding sleep loops like that, so
> > maybe it isn't a big issue, but I wanted to raise it as a potential
> concern
> > with this approach.
> >
> > Also, is the import of SASL-specific classes in KafkaChannel a major
> > objection to the current implementation?  I could eliminate that by
> > replacing the 2 offending methods in KafkaChannel with this one and
> > having the implementation delegate to the authenticator:
> >
> >     /**
> >      * Respond to a re-authentication request. This occurs on the
> >      * Server side of the re-authentication dance (i.e. on the broker).
> >      *
> >      * @param requestHeader
> >      *            the request header
> >      * @param request
> >      *            the request to process
> >      * @return the response to return to the client
> >      */
> >     public AbstractResponse respondToReauthenticationReque
> st(RequestHeader
> > requestHeader,
> >             AbstractRequest request)
> >
> > There is practically no work being done in the KafkaChannel instance
> > anyway -- it does some sanity checking but otherwise delegates to the
> > authenticator.  We could just add a method to the Authenticator interface
> > and delegate the whole thing.
> >
> > Ron
> >
> > On Wed, Sep 5, 2018 at 2:07 PM Ron Dagostino <rndgstn@gmail.com> wrote:
> >
> >> Hi Rajini.  I'm now skeptical of my "ConnectionState.REAUTHENTICATING"
> idea.
> >> The concept of a connection being "READY" or not can impact
> >> ConsumerCoordinator (see, for example,
> >> https://github.com/apache/kafka/blob/trunk/clients/src/
> main/java/org/apache/kafka/clients/consumer/internals/
> ConsumerCoordinator.java#L352).
> >> The semantics of a connection being "READY" and the
> >> implications/assumptions are not clear, and I suspect there will be some
> >> unintended consequences of this approach that may not be immediately
> >> apparent.
> >>
> >> I will confess that when I was implementing re-authentication I thought
> >> it might be worthwhile to unify the authentication-related code bases --
> >> except I suspected it would be good to go in the other direction: have
> the
> >> current code that directly uses the TransportLayer instead use
> >> AuthenticationSuccessOrFailureReceiver and
> AuthenticationRequestEnqueuer.
> >> I'm not advocating that we do it -- I decided to not go there when
> creating
> >> the PR, after all -- but I did get a strong feeling that directly using
> the
> >> TransportLayer as is currently done is really only viable before anybody
> >> else starts using the connection.  If we want to use the TransportLayer
> again
> >> after that point then it is up to us to somehow take the connection
> >> "temporarily offline" so that we have exclusive rights to it again, and
> I
> >> wonder if the concept of a connection being "temporarily offline" is
> >> something the existing code is able to handle -- probably not, and I
> >> suspect there are unstated assumptions that would be invalidated.
> >>
> >> Do you think this particular "ConnectionState.REAUTHENTICATING" idea is
> >> worth pursuing?  How about the general idea of trying to use the
> >> TransportLayer directly -- are you still feeling like it is viable?
> >>
> >> Ron
> >>
> >>
> >>
> >> Ron
> >>
> >> On Wed, Sep 5, 2018 at 11:40 AM Ron Dagostino <rndgstn@gmail.com>
> wrote:
> >>
> >>> <<<in favor of implementing server-side kill in addition to
> >>> re-authentication, not as a replacement.
> >>> <<<I think Rajini suggested the same thing
> >>> Oh, ok, I misunderstood.  Then I think we are on the same page: if we
> >>> are going to do re-authentication, then we should also do
> >>> server-side-kill-upon-expiration as part of the same implementation.
> I'm
> >>> good with that.
> >>>
> >>> I am also looking into Rajini's idea of doing re-authentication at the
> >>> NetworkClient level and reusing the existing authentication code
> path.  I
> >>> was skeptical when she suggested it, but now that I look closer I see
> >>> something that I can try.  NetworkClient has logic to recognize the
> >>> state "ConnectionState.CONNECTING" as meaning "you can't do anything
> >>> with this connection at the moment; please wait."  I'm going to try
> adding
> >>> a new state "ConnectionState.REAUTHENTICATING" that would be
> recognized
> >>> in a similar way.  Then the challenge becomes inserting myself into any
> >>> existing flow that might be going on.  I'll probably add the request
> to set
> >>> the state to "REAUTHENTICATING" to a queue if I can't grab the state
> >>> immediately and have the network client's poll() method check at the
> >>> end to see if any such requests can be granted; there would be a
> callback
> >>> associated with the request, and that way I can be assured I would be
> >>> granted the request in a reasonable amount of time (assuming the
> connection
> >>> doesn't close in the meantime).  Then it would be up the callback
> >>> implementation to perform the re-authentication dance and set the state
> >>> back to "ConnectionState.READY".  I don't know if this will work, and
> >>> I'm probably missing some subtleties at the moment, but I'll give it a
> shot
> >>> and see what happens.
> >>>
> >>> Ron
> >>>
> >>> On Wed, Sep 5, 2018 at 11:23 AM Colin McCabe <cmccabe@apache.org>
> wrote:
> >>>
> >>>> On Wed, Sep 5, 2018, at 07:34, Ron Dagostino wrote:
> >>>> > I added a "How To Support Re-Authentication for Other SASL
> Mechanisms"
> >>>> > section to the KIP as Rajini suggested.  I also added a "Rejected
> >>>> > Alternative" for the idea of forcibly closing connections on the
> >>>> client
> >>>> > side upon token refresh or on the server side upon token expiration.
> >>>> It
> >>>> > may be a bit premature to reject the server-side kill scenario
given
> >>>> that
> >>>> > Colin and Rajini are partial to it, but see below for what I said
> >>>> about it,
> >>>> > and I think it makes sense -- server-side kill without an ability
> for
> >>>> the
> >>>> > client to re-authenticate to avoid it may be useful in certain
> >>>> specific
> >>>> > cases, but as a general feature it doesn't really work.  I would
be
> >>>> willing
> >>>> > to add server-side-kill to the scope of this KIP if that is desired.
> >>>>
> >>>> Hi Ron,
> >>>>
> >>>> To clarify, I am in favor of implementing server-side kill in addition
> >>>> to re-authentication, not as a replacement.  I think Rajini suggested
> the
> >>>> same thing.
> >>>>
> >>>> It seems clear that server-side kill is needed to provide security.
> >>>> Otherwise a bad client can simply decide not to re-authenticate, and
> >>>> continue using server resources indefinitely.  Neither authentication
> nor
> >>>> re-authentication should be optional, or else the bad guys will
> simply take
> >>>> the option not to authenticate.
> >>>>
> >>>> best,
> >>>> Colin
> >>>>
> >>>>
> >>>> >
> >>>> > A brute-force alternative is to simply kill the connection on the
> >>>> client
> >>>> > > side when the background login thread refreshes the credential.
> The
> >>>> > > advantage is that we don't need a code path for re-authentication
> –
> >>>> the
> >>>> > > client simply connects again to replace the connection that
was
> >>>> killed.
> >>>> > > There are many disadvantages, though.  The approach is harsh
–
> >>>> having
> >>>> > > connections pulled out from underneath the client will introduce
> >>>> latency
> >>>> > > while the client reconnects; it introduces non-trivial resource
> >>>> utilization
> >>>> > > on both the client and server as TLS is renegotiated; and
it
> forces
> >>>> the
> >>>> > > client to periodically "recover" from what essentially looks
like
> a
> >>>> failure
> >>>> > > scenario.  While these are significant disadvantages, the
most
> >>>> significant
> >>>> > > disadvantage of all is that killing connections on the client
side
> >>>> adds no
> >>>> > > security – trusting the client to kill its connection in
a timely
> >>>> fashion
> >>>> > > is a blind and unjustifiable trust.
> >>>> > >
> >>>> >
> >>>> >
> >>>> > > We could kill the connection from the server side instead,
when
> the
> >>>> token
> >>>> > > expires.  But in this case, if there is no ability for the
client
> to
> >>>> > > re-authenticate to avoid the killing of the connection in
the
> first
> >>>> place,
> >>>> > > then we still have all of the harsh approach disadvantages
> >>>> mentioned above.
> >>>> >
> >>>> >
> >>>> > Ron
> >>>> >
> >>>> > On Wed, Sep 5, 2018 at 10:25 AM Colin McCabe <cmccabe@apache.org>
> >>>> wrote:
> >>>> >
> >>>> > > On Wed, Sep 5, 2018, at 01:41, Rajini Sivaram wrote:
> >>>> > > > *Re-authentication vs disconnection:*
> >>>> > > > In a vast number of secure Kafka deployments, SASL_SSL
is the
> >>>> security
> >>>> > > > protocol (this is the recommended config for OAUTHBEARER).
If we
> >>>> require
> >>>> > > > disconnections on token expiry, we would need new connections
to
> >>>> be
> >>>> > > > established with an expensive SSL handshake. This adds
load on
> >>>> the broker
> >>>> > > > and will result in a latency spike. For OAUTHBEARER in
> >>>> particular, when
> >>>> > > > tokens are used to make authorisation decisions, we want
to be a
> >>>> able to
> >>>> > > > support short-lived tokens where token lifetime (granting
> >>>> authorisation)
> >>>> > > is
> >>>> > > > small. To make this usable in practice, I believe we
need to
> >>>> support
> >>>> > > > re-authentication of existing connections.
> >>>> > >
> >>>> > > Hi Rajini,
> >>>> > >
> >>>> > > Thanks for the explanation.  That makes sense.
> >>>> > >
> >>>> > > >
> >>>> > > > *Also explicitly out-of-scope for this proposal is the
ability
> for
> >>>> > > brokers
> >>>> > > > to close connections that continue to use expired credentials.
> >>>> This
> >>>> > > > ability is a natural next step, but it will be addressed
via a
> >>>> separate
> >>>> > > KIP
> >>>> > > > if/when this one is adopted.*
> >>>> > > > Perhaps we could do this the other way round? I don't
think we
> >>>> would ever
> >>>> > > > want to close connections on the client-side to support
expired
> >>>> > > credentials
> >>>> > > > because that doesn't add any security guarantees. But
we do
> >>>> require the
> >>>> > > > ability for brokers to close connections in order to
enforce
> >>>> credential
> >>>> > > > expiry. Disconnection on the broker-side may be sufficient
for
> >>>> some
> >>>> > > > deployments and could be useful on its own. It would
also be the
> >>>> easier
> >>>> > > > implementation. So maybe that could be the first step?
> >>>> > >
> >>>> > > +1 for doing disconnection first.  Otherwise, as you noted,
there
> >>>> are no
> >>>> > > security guarantees -- the client can just decide not to
> >>>> re-authenticate
> >>>> > > and keep using the old credentials.  You don't even need to
modify
> >>>> the
> >>>> > > source code -- older clients would behave this way.
> >>>> > >
> >>>> > > best,
> >>>> > > Colin
> >>>> > >
> >>>> > > >
> >>>> > > > *The implementation is designed in such a way that it
does not
> >>>> preclude
> >>>> > > > adding support for re-authentication of other SASL mechanism
> >>>> (e.g. PLAIN,
> >>>> > > > SCRAM-related, and GSSAPI), but doing so is explicitly
> >>>> out-of-scope for
> >>>> > > > this proposal. *
> >>>> > > > Isn't re-authentication driven by ExpiringCredential?
We don't
> >>>> need to
> >>>> > > > support re-authentication by default for the other mechanisms
in
> >>>> this
> >>>> > > KIP,
> >>>> > > > but any mechanism could enable this by adding a custom
login
> >>>> callback
> >>>> > > > handler to provide an ExpiringCredential? For disconnection
as
> >>>> well as
> >>>> > > > re-authentication, it will be good if we can specify
exactly how
> >>>> it can
> >>>> > > be
> >>>> > > > implemented for each of the SASL mechanisms, even if
we actually
> >>>> > > implement
> >>>> > > > it only for OAUTHBEARER.
> >>>> > > >
> >>>> > > >
> >>>> > > > On Wed, Sep 5, 2018 at 2:43 AM, Colin McCabe <
> cmccabe@apache.org>
> >>>> wrote:
> >>>> > > >
> >>>> > > > > On Tue, Sep 4, 2018, at 17:43, Ron Dagostino wrote:
> >>>> > > > > > Hi Colin.  Different organizations will rely
on different
> >>>> token
> >>>> > > > > lifetimes,
> >>>> > > > > > but anything shorter than an hour feels like
it would be
> >>>> pretty
> >>>> > > > > > aggressive.  An hour or more will probably
be most common.
> >>>> > > > >
> >>>> > > > > Thanks.  That's helpful to give me a sense of what
the
> >>>> performance
> >>>> > > impact
> >>>> > > > > might be.
> >>>> > > > >
> >>>> > > > > >
> >>>> > > > > > <<<alternate solution of terminating
connections when the
> >>>> bearer
> >>>> > > token
> >>>> > > > > > changed
> >>>> > > > > > I may be mistaken, but I think you are suggesting
here that
> we
> >>>> > > forcibly
> >>>> > > > > > kill connections from the client side whenever
the
> background
> >>>> Login
> >>>> > > > > refresh
> >>>> > > > > > thread refreshes the token (which it currently
does so that
> >>>> the
> >>>> > > client
> >>>> > > > > can
> >>>> > > > > > continue to make new connections).  Am I correct?
> >>>> > > > >
> >>>> > > > > Yes, this is what I'm thinking about.  We could
also terminate
> >>>> the
> >>>> > > > > connection on the server, if that is more convenient.
> >>>> > > > >
> >>>> > > > > >  If that is what you are
> >>>> > > > > > referring to, my sense is that it would be
a very crude way
> of
> >>>> > > dealing
> >>>> > > > > with
> >>>> > > > > > the issue that would probably lead to dissatisfaction
in
> some
> >>>> sense
> >>>> > > > > (though
> >>>> > > > > > I can't be sure).
> >>>> > > > >
> >>>> > > > > What information should we gather so that we can
be sure?
> >>>> > > > >
> >>>> > > > > >  I do know that when I implemented SASL/OAUTHBEARER
it
> >>>> > > > > > was communicated that leaving existing connections
intact --
> >>>> as is
> >>>> > > done
> >>>> > > > > for
> >>>> > > > > > GSSAPI -- was the appropriate path forward.
> >>>> > > > >
> >>>> > > > > Thanks, that's good background information.  Can
someone chime
> >>>> in with
> >>>> > > the
> >>>> > > > > reasoning behind this?
> >>>> > > > >
> >>>> > > > > My best guess is that terminating connections might
cause a
> >>>> temporary
> >>>> > > > > increase in latency as they are re-established.
> >>>> > > > >
> >>>> > > > > In any case, we should figure out what the reasoning
is so
> that
> >>>> we can
> >>>> > > > > make a decision.  It seems worthwhile including
this as a
> >>>> "rejected
> >>>> > > > > alternative," at least.
> >>>> > > > >
> >>>> > > > > thanks,
> >>>> > > > > Colin
> >>>> > > > >
> >>>> > > > >
> >>>> > > > > >
> >>>> > > > > > Ron
> >>>> > > > > >
> >>>> > > > > > On Tue, Sep 4, 2018 at 8:31 PM Colin McCabe
<
> >>>> cmccabe@apache.org>
> >>>> > > wrote:
> >>>> > > > > >
> >>>> > > > > > > Hi Ron,
> >>>> > > > > > >
> >>>> > > > > > > Thanks for the KIP.
> >>>> > > > > > >
> >>>> > > > > > > What is the frequency at which you envision
bearer tokens
> >>>> changing
> >>>> > > at?
> >>>> > > > > > >
> >>>> > > > > > > Did you consider the alternate solution
of terminating
> >>>> connections
> >>>> > > when
> >>>> > > > > > > the bearer token changed?
> >>>> > > > > > >
> >>>> > > > > > > best,
> >>>> > > > > > > Colin
> >>>> > > > > > >
> >>>> > > > > > >
> >>>> > > > > > > On Tue, Aug 28, 2018, at 07:28, Ron Dagostino
wrote:
> >>>> > > > > > > > Hi everyone. I created KIP 368: Allow
SASL Connections
> to
> >>>> > > > > Periodically
> >>>> > > > > > > > Re-Authenticate
> >>>> > > > > > > > <
> >>>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>> > > > > 368%3A+Allow+SASL+Connections+to+Periodically+Re-Authenticate
> >>>> > > > > > > >
> >>>> > > > > > > > (
> >>>> > > > > > > >
> >>>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>> > > > > 368%3A+Allow+SASL+Connections+to+Periodically+Re-Authenticate
> >>>> > > > > > > ).
> >>>> > > > > > > > The motivation for this KIP is as
follows:
> >>>> > > > > > > >
> >>>> > > > > > > > The adoption of KIP-255: OAuth Authentication
via
> >>>> > > SASL/OAUTHBEARER
> >>>> > > > > > > > <
> >>>> > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.
> >>>> > > > > action?pageId=75968876>
> >>>> > > > > > >
> >>>> > > > > > > > in
> >>>> > > > > > > > release 2.0.0 creates the possibility
of using
> >>>> information in the
> >>>> > > > > bearer
> >>>> > > > > > > > token to make authorization decisions.
 Unfortunately,
> >>>> however,
> >>>> > > Kafka
> >>>> > > > > > > > connections are long-lived, so there
is no ability to
> >>>> change the
> >>>> > > > > bearer
> >>>> > > > > > > > token associated with a particular
connection.  Allowing
> >>>> SASL
> >>>> > > > > > > > connections
> >>>> > > > > > > > to periodically re-authenticate would
resolve this.  In
> >>>> addition
> >>>> > > to
> >>>> > > > > this
> >>>> > > > > > > > motivation there are two others that
are
> security-related.
> >>>> > > First, to
> >>>> > > > > > > > eliminate access to Kafka for connected
clients, the
> >>>> current
> >>>> > > > > requirement
> >>>> > > > > > > > is
> >>>> > > > > > > > to remove all authorizations (i.e.
remove all ACLs).
> >>>> This is
> >>>> > > > > necessary
> >>>> > > > > > > > because of the long-lived nature
of the connections.  It
> >>>> is
> >>>> > > > > > > > operationally
> >>>> > > > > > > > simpler to shut off access at the
point of
> >>>> authentication, and
> >>>> > > with
> >>>> > > > > the
> >>>> > > > > > > > release of KIP-86: Configurable SASL
Callback Handlers
> >>>> > > > > > > > <
> >>>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>> > > > > 86%3A+Configurable+SASL+callback+handlers
> >>>> > > > > > > >
> >>>> > > > > > > > it
> >>>> > > > > > > > is going to become more and more
likely that
> >>>> installations will
> >>>> > > > > > > > authenticate users against external
directories (e.g.
> via
> >>>> > > LDAP).  The
> >>>> > > > > > > > ability to stop Kafka access by simply
disabling an
> >>>> account in an
> >>>> > > > > LDAP
> >>>> > > > > > > > directory (for example) is desirable.
 The second
> >>>> motivating
> >>>> > > factor
> >>>> > > > > for
> >>>> > > > > > > > re-authentication related to security
is that the use of
> >>>> > > short-lived
> >>>> > > > > > > > tokens
> >>>> > > > > > > > is a common OAuth security recommendation,
but issuing a
> >>>> > > short-lived
> >>>> > > > > > > > token
> >>>> > > > > > > > to a Kafka client (or a broker when
OAUTHBEARER is the
> >>>> > > inter-broker
> >>>> > > > > > > > protocol) currently has no benefit
because once a client
> >>>> is
> >>>> > > > > connected to
> >>>> > > > > > > > a
> >>>> > > > > > > > broker the client is never challenged
again and the
> >>>> connection
> >>>> > > may
> >>>> > > > > > > > remain
> >>>> > > > > > > > intact beyond the token expiration
time (and may remain
> >>>> intact
> >>>> > > > > > > > indefinitely
> >>>> > > > > > > > under perfect circumstances).  This
KIP proposes adding
> >>>> the
> >>>> > > ability
> >>>> > > > > for
> >>>> > > > > > > > clients (and brokers when OAUTHBEARER
is the
> inter-broker
> >>>> > > protocol)
> >>>> > > > > to
> >>>> > > > > > > > re-authenticate their connections
to brokers and have
> the
> >>>> new
> >>>> > > bearer
> >>>> > > > > > > > token
> >>>> > > > > > > > appear on their session rather than
the old one.
> >>>> > > > > > > >
> >>>> > > > > > > > The description of this KIP is actually
quite
> >>>> straightforward
> >>>> > > from a
> >>>> > > > > > > > functionality perspective; from an
implementation
> >>>> perspective,
> >>>> > > > > though,
> >>>> > > > > > > the
> >>>> > > > > > > > KIP is not so straightforward, so
it includes a pull
> >>>> request
> >>>> > > with a
> >>>> > > > > > > > proposed implementation.  See
> https://github.com/apache/
> >>>> > > > > kafka/pull/5582.
> >>>> > > > > > > >
> >>>> > > > > > > > Ron
> >>>> > > > > > >
> >>>> > > > >
> >>>> > >
> >>>>
> >>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message