kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonghyun Lee <jonghy...@gmail.com>
Subject Re: [VOTE] KIP-219 - Improve Quota Communication
Date Wed, 02 May 2018 19:08:07 GMT
Hi Magnus and Jun, do you have any feedback on this?

Since two of the original voters (Becket and Rajini) showed a preference
for bumping up all request versions, I'll wait till tomorrow and start
implementing it unless someone expresses different opinions.

As for the implementation, the current plan is to add a method to
AbstractResponse which tells whether or not client should throttle and have
an implementation in each response type based on its version.

Thanks,
Jon


On Mon, Apr 30, 2018 at 9:15 AM, Jonghyun Lee <jonghyun2@gmail.com> wrote:

> Hi Rajini,
>
> Thanks for letting me know the SASL use case. I think this is a similar
> point that Becket raised (creating dependency between different requests)
> and it makes sense to me.
>
> Will wait for some more feedback and finalize.
>
> Thanks,
> Jon
>
>
>
> On Mon, Apr 30, 2018 at 3:46 AM, Rajini Sivaram <rajinisivaram@gmail.com>
> wrote:
>
>> Hi Jon,
>>
>> We currently send two ApiVersions requests for SASL, the first with
>> version
>> 0 to get the request versions for SASL and another after authentication.
>> You are currently using the second and basing all throttling decisions on
>> that. This may get in the way if we wanted to change this code in the
>> future (e.g. throttle SASL requests or combine the two ApiVersions
>> requests
>> into one). So if it is easy enough to check request versions against an
>> immutable map rather than track broker ApiVersions in a mutable set,
>> perhaps that would be better?
>>
>> Hi Magnus, It will be good to know what works better for non-Java clients.
>>
>>
>> On Fri, Apr 27, 2018 at 8:34 PM, Jonghyun Lee <jonghyun2@gmail.com>
>> wrote:
>>
>> > Thanks, Becket.
>> >
>> > Assuming that requiring new client implementations to issue
>> > ApiVersionsRequest before issuing any other requests is reasonable as
>> you
>> > mentioned, I am wondering if keeping track of brokers that support
>> KIP-219
>> > is actually simpler than keeping a map from each request type to its min
>> > version number supporting KIP-219 and checking the version of every
>> single
>> > request to see whether or not to throttle from the client side. To me,
>> it
>> > looks like a broker property that can be cached, and checking every
>> single
>> > request seems excessive.
>> >
>> > Thanks,
>> > Jon
>> >
>> >
>> >
>> >
>> > On Fri, Apr 27, 2018 at 11:36 AM, Becket Qin <becket.qin@gmail.com>
>> wrote:
>> >
>> > > The reason we wanted to bump up all the request versions was let the
>> > > clients know whether the broker has already throttled the request or
>> not.
>> > > This avoids double throttling in both brokers and clients, if the
>> clients
>> > > implementation supports KIP-219.
>> > >
>> > > The new change uses only ApiVersionRequest, instead of all the
>> requests,
>> > to
>> > > indicate whether the brokers have applied throttling or not. The
>> > difference
>> > > is that all the clients must issue ApiVersionRequest first before
>> issuing
>> > > any other requests. This has no impact on the existing clients. But
>> for
>> > > clients implementation that wants to support KIP-219, they need to
>> follow
>> > > this practice, which seems to be reasonable.
>> > > Initially I thought the change is fine. However, thinking about this
>> > again,
>> > > that means the clients implementation needs to remember the
>> ApiVersions
>> > of
>> > > each broker, which may have some complexity. Also this seems
>> introduces
>> > the
>> > > behavior dependency between different requests, which seems
>> unnecessary.
>> > >
>> > > Due to the above reasons, I think it might be better to bump up all
>> the
>> > > request versions.
>> > >
>> > > Thanks,
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > >
>> > > On Wed, Apr 25, 2018 at 3:19 PM, Jonghyun Lee <jonghyun2@gmail.com>
>> > wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > Per Ismael's suggestion, I'd like to get comments from the original
>> > > voters
>> > > > for KIP-219 (Becket, Jun, Rajini) and others about the new interface
>> > > change
>> > > > proposed in the discussion thread (
>> > > > https://markmail.org/search/?q=kafka+KIP-219#query:kafka%
>> > > > 20KIP-219+page:1+mid:5rwju2gwpicojr3f+state:results).
>> > > > Would you let me know?
>> > > >
>> > > > Thanks,
>> > > > Jon
>> > > >
>> > > >
>> > > > On Wed, Apr 25, 2018 at 2:43 PM, Dong Lin <lindong28@gmail.com>
>> wrote:
>> > > >
>> > > > > +Jon
>> > > > >
>> > > > > On Mon, 22 Jan 2018 at 10:38 AM Becket Qin <becket.qin@gmail.com>
>> > > wrote:
>> > > > >
>> > > > >> Thanks for the discussion and voting.
>> > > > >>
>> > > > >> KIP-219 has passed with +3 binding votes (Becket, Jun, Rajini).
>> > > > >>
>> > > > >> On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram <
>> > > > rajinisivaram@gmail.com>
>> > > > >> wrote:
>> > > > >>
>> > > > >> > Hi Becket,
>> > > > >> >
>> > > > >> > Thanks for the update. Yes, it does address my concern.
>> > > > >> >
>> > > > >> > +1 (binding)
>> > > > >> >
>> > > > >> > Regards,
>> > > > >> >
>> > > > >> > Rajini
>> > > > >> >
>> > > > >> > On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin <
>> becket.qin@gmail.com
>> > >
>> > > > >> wrote:
>> > > > >> >
>> > > > >> > > Actually returning an empty fetch request may still
be
>> useful to
>> > > > >> reduce
>> > > > >> > the
>> > > > >> > > throttle time due to request quota violation because
the
>> > > > FetchResponse
>> > > > >> > send
>> > > > >> > > time will be less. I just updated the KIP.
>> > > > >> > >
>> > > > >> > > Rajini, does that address your concern?
>> > > > >> > >
>> > > > >> > > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin <
>> > becket.qin@gmail.com
>> > > >
>> > > > >> > wrote:
>> > > > >> > >
>> > > > >> > > > Thanks for the reply, Jun.
>> > > > >> > > >
>> > > > >> > > > Currently the byte rate quota does not apply
to
>> > > HeartbeatRequest,
>> > > > >> > > > JoinGroupRequest/SyncGroupRequest. So the
only case those
>> > > > requests
>> > > > >> are
>> > > > >> > > > throttled is because the request quota is
violated. In that
>> > > case,
>> > > > >> the
>> > > > >> > > > throttle time does not really matter whether
we return a
>> full
>> > > > >> > > FetchResponse
>> > > > >> > > > or an empty one. Would it be more consistent
if we throttle
>> > > based
>> > > > on
>> > > > >> > the
>> > > > >> > > > actual throttle time / channel mute time?
>> > > > >> > > >
>> > > > >> > > > Thanks,
>> > > > >> > > >
>> > > > >> > > > Jiangjie (Becket) Qin
>> > > > >> > > >
>> > > > >> > > > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao <jun@confluent.io
>> >
>> > > > wrote:
>> > > > >> > > >
>> > > > >> > > >> Hi, Jiangjie,
>> > > > >> > > >>
>> > > > >> > > >> You are right that the heartbeat is done
in a channel
>> > different
>> > > > >> from
>> > > > >> > the
>> > > > >> > > >> fetch request. I think it's still useful
to return an
>> empty
>> > > fetch
>> > > > >> > > response
>> > > > >> > > >> when the quota is violated. This way,
the throttle time
>> for
>> > the
>> > > > >> > > heartbeat
>> > > > >> > > >> request won't be large. I agree that we
can just mute the
>> > > channel
>> > > > >> for
>> > > > >> > > the
>> > > > >> > > >> fetch request for the throttle time computed
based on a
>> full
>> > > > fetch
>> > > > >> > > >> response. This probably also partially
addresses Rajini's
>> #1
>> > > > >> concern.
>> > > > >> > > >>
>> > > > >> > > >> Thanks,
>> > > > >> > > >>
>> > > > >> > > >> Jun
>> > > > >> > > >>
>> > > > >> > > >> On Mon, Jan 15, 2018 at 9:27 PM, Becket
Qin <
>> > > > becket.qin@gmail.com>
>> > > > >> > > wrote:
>> > > > >> > > >>
>> > > > >> > > >> > Hi Rajini,
>> > > > >> > > >> >
>> > > > >> > > >> > Thanks for the comments. Pleas see
the reply inline.
>> > > > >> > > >> >
>> > > > >> > > >> > Hi Jun,
>> > > > >> > > >> >
>> > > > >> > > >> > Thinking about the consumer rebalance
case a bit more,
>> I am
>> > > not
>> > > > >> sure
>> > > > >> > > if
>> > > > >> > > >> we
>> > > > >> > > >> > need to worry about the delayed rebalance
due to quota
>> > > > violation
>> > > > >> or
>> > > > >> > > not.
>> > > > >> > > >> > The rebalance actually uses a separate
channel to
>> > > coordinator.
>> > > > >> > > Therefore
>> > > > >> > > >> > unless the request quota is hit,
the rebalance won't be
>> > > > >> throttled.
>> > > > >> > > Even
>> > > > >> > > >> if
>> > > > >> > > >> > request quota is hit, it seems unlikely
to be delayed
>> long.
>> > > So
>> > > > it
>> > > > >> > > looks
>> > > > >> > > >> > that we don't need to unmute the
channel earlier than
>> > needed.
>> > > > >> Does
>> > > > >> > > this
>> > > > >> > > >> > address your concern?
>> > > > >> > > >> >
>> > > > >> > > >> > Thanks,
>> > > > >> > > >> >
>> > > > >> > > >> > Jiangjie (Becket) Qin
>> > > > >> > > >> >
>> > > > >> > > >> > On Mon, Jan 15, 2018 at 4:22 AM,
Rajini Sivaram <
>> > > > >> > > >> rajinisivaram@gmail.com>
>> > > > >> > > >> > wrote:
>> > > > >> > > >> >
>> > > > >> > > >> > > Hi Becket,
>> > > > >> > > >> > >
>> > > > >> > > >> > > A few questions:
>> > > > >> > > >> > >
>> > > > >> > > >> > > 1. KIP says: *Although older
client implementations
>> > (prior
>> > > to
>> > > > >> > > >> knowledge
>> > > > >> > > >> > of
>> > > > >> > > >> > > this KIP) will immediately send
the next request after
>> > the
>> > > > >> broker
>> > > > >> > > >> > responds
>> > > > >> > > >> > > without paying attention to
the throttle time field,
>> the
>> > > > >> broker is
>> > > > >> > > >> > > protected by virtue of muting
the channel for time X.
>> > i.e.,
>> > > > the
>> > > > >> > next
>> > > > >> > > >> > > request will not be processed
until the channel is
>> > > unmuted. *
>> > > > >> > > >> > > For fetch requests, the response
is sent immediately
>> and
>> > > the
>> > > > >> mute
>> > > > >> > > >> time on
>> > > > >> > > >> > > the broker based on empty fetch
response will often be
>> > zero
>> > > > >> > (unlike
>> > > > >> > > >> the
>> > > > >> > > >> > > throttle time returned to the
client based on
>> non-empty
>> > > > >> response).
>> > > > >> > > >> Won't
>> > > > >> > > >> > > that lead to a tight loop of
fetch requests from old
>> > > clients
>> > > > >> > > >> > (particularly
>> > > > >> > > >> > > expensive with SSL)? Wouldn't
it be better to retain
>> > > current
>> > > > >> > > behaviour
>> > > > >> > > >> > for
>> > > > >> > > >> > > old clients? Also, if we change
the behaviour for old
>> > > > clients,
>> > > > >> > > client
>> > > > >> > > >> > > metrics that track throttle
time will report a lot of
>> > > > throttle
>> > > > >> > > >> unrelated
>> > > > >> > > >> > to
>> > > > >> > > >> > >  actual throttle time.
>> > > > >> > > >> > >
>> > > > >> > > >> > For consumers, if quota is violated,
the throttle time
>> on
>> > the
>> > > > >> broker
>> > > > >> > > >> will
>> > > > >> > > >> > not be 0. It is just that the throttle
time will not be
>> > > > >> increasing
>> > > > >> > > >> because
>> > > > >> > > >> > the consumer will return an empty
response in this
>> case. So
>> > > > there
>> > > > >> > > should
>> > > > >> > > >> > not be a tight loop.
>> > > > >> > > >> >
>> > > > >> > > >> >
>> > > > >> > > >> > > 2. KIP says: *The usual idle
timeout i.e.,
>> > > > >> > connections.max.idle.ms
>> > > > >> > > >> > > <http://connections.max.idle.ms>
will still be
>> honored
>> > > > during
>> > > > >> the
>> > > > >> > > >> > throttle
>> > > > >> > > >> > > time X. This makes sure that
the brokers will detect
>> > client
>> > > > >> > > connection
>> > > > >> > > >> > > closure in a bounded time.*
>> > > > >> > > >> > > Wouldn't it be better to bound
maximum throttle time
>> to
>> > > > >> > > >> > > *connections.max.idle.ms
>> > > > >> > > >> > > <http://connections.max.idle.ms>*?
If we mute for a
>> time
>> > > > >> greater
>> > > > >> > > than
>> > > > >> > > >> > > *connections.max.idle.ms
>> > > > >> > > >> > > <http://connections.max.idle.ms>*
and then close a
>> > client
>> > > > >> > > connection
>> > > > >> > > >> > > simply
>> > > > >> > > >> > > because we had muted it on the
broker for a longer
>> > throttle
>> > > > >> time,
>> > > > >> > we
>> > > > >> > > >> > force
>> > > > >> > > >> > > a reconnection and read the
next request from the new
>> > > > >> connection
>> > > > >> > > >> straight
>> > > > >> > > >> > > away. This feels the same as
a bound on the quota of *
>> > > > >> > > >> > > connections.max.idle.ms
>> > > > >> > > >> > > <http://connections.max.idle.ms>*,
but with added
>> load
>> > on
>> > > > the
>> > > > >> > > broker
>> > > > >> > > >> for
>> > > > >> > > >> > > authenticating another connection
(expensive with
>> SSL).
>> > > > >> > > >> > >
>> > > > >> > > >> > I think we need to think about the
consumer prior to and
>> > > after
>> > > > >> this
>> > > > >> > > KIP
>> > > > >> > > >> > separately.
>> > > > >> > > >> >
>> > > > >> > > >> > For consumer prior to this KIP, even
if we unmute the
>> > channel
>> > > > >> after
>> > > > >> > > >> > connection.max.idle.ms, it is likely
that the consumers
>> > have
>> > > > >> > already
>> > > > >> > > >> > reached request.timeout.ms before
that and has
>> reconnected
>> > > to
>> > > > >> the
>> > > > >> > > >> broker.
>> > > > >> > > >> > So there is no real difference whether
we close the
>> > throttled
>> > > > >> > channel
>> > > > >> > > or
>> > > > >> > > >> > not.
>> > > > >> > > >> >
>> > > > >> > > >> > For consumers after the KIP, because
they will honor the
>> > > > throttle
>> > > > >> > > time,
>> > > > >> > > >> > they will back off until throttle
time is reached. If
>> that
>> > > > >> throttle
>> > > > >> > > >> time is
>> > > > >> > > >> > longer than connection.max.idle.ms,
it seems not a big
>> > > > overhead
>> > > > >> > > because
>> > > > >> > > >> > there will only be one connection
re-establishment in
>> > quite a
>> > > > few
>> > > > >> > > >> minutes.
>> > > > >> > > >> > Compared with such overhead, it seems
more important to
>> > honor
>> > > > the
>> > > > >> > > quota
>> > > > >> > > >> so
>> > > > >> > > >> > the broker can survive.
>> > > > >> > > >> >
>> > > > >> > > >> >
>> > > > >> > > >> > > 3. Are we changing the behaviour
of network bandwidth
>> > quota
>> > > > for
>> > > > >> > > >> > > Produce/Fetch and retaining
current behaviour for
>> request
>> > > > >> quotas?
>> > > > >> > > >> > >
>> > > > >> > > >> > This is going to be applied to all
the quota. Including
>> > > request
>> > > > >> > > quotas.
>> > > > >> > > >> >
>> > > > >> > > >> >
>> > > > >> > > >> > >
>> > > > >> > > >> > > Thanks,
>> > > > >> > > >> > >
>> > > > >> > > >> > > Rajini
>> > > > >> > > >> > >
>> > > > >> > > >> > >
>> > > > >> > > >> > >
>> > > > >> > > >> > > On Wed, Jan 10, 2018 at 10:29
PM, Jun Rao <
>> > > jun@confluent.io>
>> > > > >> > wrote:
>> > > > >> > > >> > >
>> > > > >> > > >> > > > Hi, Jiangjie,
>> > > > >> > > >> > > >
>> > > > >> > > >> > > > Thanks for the updated
KIP. +1
>> > > > >> > > >> > > >
>> > > > >> > > >> > > > Jun
>> > > > >> > > >> > > >
>> > > > >> > > >> > > > On Mon, Jan 8, 2018 at
7:45 PM, Becket Qin <
>> > > > >> > becket.qin@gmail.com>
>> > > > >> > > >> > wrote:
>> > > > >> > > >> > > >
>> > > > >> > > >> > > > > Thanks for the comments,
Jun.
>> > > > >> > > >> > > > >
>> > > > >> > > >> > > > > 1. Good point.
>> > > > >> > > >> > > > > 2. Also makes sense.
Usually the
>> > > connection.max.idle.ms
>> > > > is
>> > > > >> > high
>> > > > >> > > >> > enough
>> > > > >> > > >> > > > so
>> > > > >> > > >> > > > > the throttling is
impacted.
>> > > > >> > > >> > > > >
>> > > > >> > > >> > > > > I have updated the
KIP to reflect the changes.
>> > > > >> > > >> > > > >
>> > > > >> > > >> > > > > Thanks,
>> > > > >> > > >> > > > >
>> > > > >> > > >> > > > > Jiangjie (Becket)
Qin
>> > > > >> > > >> > > > >
>> > > > >> > > >> > > > >
>> > > > >> > > >> > > > > On Mon, Jan 8, 2018
at 6:30 PM, Jun Rao <
>> > > > jun@confluent.io>
>> > > > >> > > wrote:
>> > > > >> > > >> > > > >
>> > > > >> > > >> > > > > > Hi, Jiangjie,
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > > > Sorry for the
late response. The proposal sounds
>> > good
>> > > > >> > > overall. A
>> > > > >> > > >> > > couple
>> > > > >> > > >> > > > > of
>> > > > >> > > >> > > > > > minor comments
below.
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > > > 1. For throttling
a fetch request, we could
>> > > potentially
>> > > > >> just
>> > > > >> > > >> send
>> > > > >> > > >> > an
>> > > > >> > > >> > > > > empty
>> > > > >> > > >> > > > > > response. We
can return a throttle time
>> calculated
>> > > > from a
>> > > > >> > full
>> > > > >> > > >> > > > response,
>> > > > >> > > >> > > > > > but only mute
the channel on the server based
>> on a
>> > > > >> throttle
>> > > > >> > > time
>> > > > >> > > >> > > > > calculated
>> > > > >> > > >> > > > > > based on the
empty response. This has the
>> benefit
>> > > that
>> > > > >> the
>> > > > >> > > >> server
>> > > > >> > > >> > > will
>> > > > >> > > >> > > > > mute
>> > > > >> > > >> > > > > > the channel much
shorter, which will prevent the
>> > > > consumer
>> > > > >> > from
>> > > > >> > > >> > > > > rebalancing
>> > > > >> > > >> > > > > > when throttled.
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > > > 2. The wiki says
"connections.max.idle.ms
>> should
>> > be
>> > > > >> ignored
>> > > > >> > > >> during
>> > > > >> > > >> > > the
>> > > > >> > > >> > > > > > throttle time
X." This has the potential issue
>> > that a
>> > > > >> server
>> > > > >> > > may
>> > > > >> > > >> > not
>> > > > >> > > >> > > > > detect
>> > > > >> > > >> > > > > > that a client
connection is already gone until
>> > after
>> > > an
>> > > > >> > > >> arbitrary
>> > > > >> > > >> > > > amount
>> > > > >> > > >> > > > > of
>> > > > >> > > >> > > > > > time. Perhaps
we could still just close a
>> > connection
>> > > if
>> > > > >> the
>> > > > >> > > >> server
>> > > > >> > > >> > > has
>> > > > >> > > >> > > > > > muted it for
longer than
>> connections.max.idle.ms.
>> > > This
>> > > > >> will
>> > > > >> > > at
>> > > > >> > > >> > least
>> > > > >> > > >> > > > > bound
>> > > > >> > > >> > > > > > the time for
a server to detect closed client
>> > > > >> connections.
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > > > Thanks,
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > > > Jun
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > > > On Mon, Nov 20,
2017 at 5:30 PM, Becket Qin <
>> > > > >> > > >> becket.qin@gmail.com>
>> > > > >> > > >> > > > > wrote:
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > > > > Hi,
>> > > > >> > > >> > > > > > >
>> > > > >> > > >> > > > > > > We would
like to start the voting thread for
>> > > KIP-219.
>> > > > >> The
>> > > > >> > > KIP
>> > > > >> > > >> > > > proposes
>> > > > >> > > >> > > > > to
>> > > > >> > > >> > > > > > > improve
the quota communication between the
>> > brokers
>> > > > and
>> > > > >> > > >> clients,
>> > > > >> > > >> > > > > > especially
>> > > > >> > > >> > > > > > > for cases
of long throttling time.
>> > > > >> > > >> > > > > > >
>> > > > >> > > >> > > > > > > The KIP
wiki is following:
>> > > > >> > > >> > > > > > > https://cwiki.apache.org/
>> > > > confluence/display/KAFKA/KIP-
>> > > > >> > > >> > > > > > 219+-+Improve+quota+
>> > > > >> > > >> > > > > > > communication
>> > > > >> > > >> > > > > > >
>> > > > >> > > >> > > > > > > The discussion
thread is here:
>> > > > >> > > >> > > > > > > http://markmail.org/search/?q=
>> > > > >> kafka+KIP-219#query:kafka%
>> > > > >> > > >> > > > > > > 20KIP-219+page:1+mid:
>> > > ooxabguy7nz7l7zy+state:results
>> > > > >> > > >> > > > > > >
>> > > > >> > > >> > > > > > > Thanks,
>> > > > >> > > >> > > > > > >
>> > > > >> > > >> > > > > > > Jiangjie
(Becket) Qin
>> > > > >> > > >> > > > > > >
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > >
>> > > > >> > > >> > > >
>> > > > >> > > >> > >
>> > > > >> > > >> >
>> > > > >> > > >>
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message