kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Becket Qin <becket....@gmail.com>
Subject Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests
Date Wed, 18 Jul 2018 10:05:39 GMT
Hey Joel,

Thank for the detail explanation. I agree the current design makes sense.
My confusion is about whether the new config for the controller queue
capacity is necessary. I cannot think of a case in which users would change
it.

Thanks,

Jiangjie (Becket) Qin

On Wed, Jul 18, 2018 at 6:00 PM, Becket Qin <becket.qin@gmail.com> wrote:

> Hi Lucas,
>
> I guess my question can be rephrased to "do we expect user to ever change
> the controller request queue capacity"? If we agree that 20 is already a
> very generous default number and we do not expect user to change it, is it
> still necessary to expose this as a config?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Jul 18, 2018 at 2:29 AM, Lucas Wang <lucasatucla@gmail.com> wrote:
>
>> @Becket
>> 1. Thanks for the comment. You are right that normally there should be
>> just
>> one controller request because of muting,
>> and I had NOT intended to say there would be many enqueued controller
>> requests.
>> I went through the KIP again, and I'm not sure which part conveys that
>> info.
>> I'd be happy to revise if you point it out the section.
>>
>> 2. Though it should not happen in normal conditions, the current design
>> does not preclude multiple controllers running
>> at the same time, hence if we don't have the controller queue capacity
>> config and simply make its capacity to be 1,
>> network threads handling requests from different controllers will be
>> blocked during those troublesome times,
>> which is probably not what we want. On the other hand, adding the extra
>> config with a default value, say 20, guards us from issues in those
>> troublesome times, and IMO there isn't much downside of adding the extra
>> config.
>>
>> @Mayuresh
>> Good catch, this sentence is an obsolete statement based on a previous
>> design. I've revised the wording in the KIP.
>>
>> Thanks,
>> Lucas
>>
>> On Tue, Jul 17, 2018 at 10:33 AM, Mayuresh Gharat <
>> gharatmayuresh15@gmail.com> wrote:
>>
>> > Hi Lucas,
>> >
>> > Thanks for the KIP.
>> > I am trying to understand why you think "The memory consumption can rise
>> > given the total number of queued requests can go up to 2x" in the impact
>> > section. Normally the requests from controller to a Broker are not high
>> > volume, right ?
>> >
>> >
>> > Thanks,
>> >
>> > Mayuresh
>> >
>> > On Tue, Jul 17, 2018 at 5:06 AM Becket Qin <becket.qin@gmail.com>
>> wrote:
>> >
>> > > Thanks for the KIP, Lucas. Separating the control plane from the data
>> > plane
>> > > makes a lot of sense.
>> > >
>> > > In the KIP you mentioned that the controller request queue may have
>> many
>> > > requests in it. Will this be a common case? The controller requests
>> still
>> > > goes through the SocketServer. The SocketServer will mute the channel
>> > once
>> > > a request is read and put into the request channel. So assuming there
>> is
>> > > only one connection between controller and each broker, on the broker
>> > side,
>> > > there should be only one controller request in the controller request
>> > queue
>> > > at any given time. If that is the case, do we need a separate
>> controller
>> > > request queue capacity config? The default value 20 means that we
>> expect
>> > > there are 20 controller switches to happen in a short period of time.
>> I
>> > am
>> > > not sure whether someone should increase the controller request queue
>> > > capacity to handle such case, as it seems indicating something very
>> wrong
>> > > has happened.
>> > >
>> > > Thanks,
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > >
>> > > On Fri, Jul 13, 2018 at 1:10 PM, Dong Lin <lindong28@gmail.com>
>> wrote:
>> > >
>> > > > Thanks for the update Lucas.
>> > > >
>> > > > I think the motivation section is intuitive. It will be good to
>> learn
>> > > more
>> > > > about the comments from other reviewers.
>> > > >
>> > > > On Thu, Jul 12, 2018 at 9:48 PM, Lucas Wang <lucasatucla@gmail.com>
>> > > wrote:
>> > > >
>> > > > > Hi Dong,
>> > > > >
>> > > > > I've updated the motivation section of the KIP by explaining the
>> > cases
>> > > > that
>> > > > > would have user impacts.
>> > > > > Please take a look at let me know your comments.
>> > > > >
>> > > > > Thanks,
>> > > > > Lucas
>> > > > >
>> > > > > On Mon, Jul 9, 2018 at 5:53 PM, Lucas Wang <lucasatucla@gmail.com
>> >
>> > > > wrote:
>> > > > >
>> > > > > > Hi Dong,
>> > > > > >
>> > > > > > The simulation of disk being slow is merely for me to easily
>> > > construct
>> > > > a
>> > > > > > testing scenario
>> > > > > > with a backlog of produce requests. In production, other than
>> the
>> > > disk
>> > > > > > being slow, a backlog of
>> > > > > > produce requests may also be caused by high produce QPS.
>> > > > > > In that case, we may not want to kill the broker and that's when
>> > this
>> > > > KIP
>> > > > > > can be useful, both for JBOD
>> > > > > > and non-JBOD setup.
>> > > > > >
>> > > > > > Going back to your previous question about each ProduceRequest
>> > > covering
>> > > > > 20
>> > > > > > partitions that are randomly
>> > > > > > distributed, let's say a LeaderAndIsr request is enqueued that
>> > tries
>> > > to
>> > > > > > switch the current broker, say broker0, from leader to follower
>> > > > > > *for one of the partitions*, say *test-0*. For the sake of
>> > argument,
>> > > > > > let's also assume the other brokers, say broker1, have *stopped*
>> > > > fetching
>> > > > > > from
>> > > > > > the current broker, i.e. broker0.
>> > > > > > 1. If the enqueued produce requests have acks =  -1 (ALL)
>> > > > > >   1.1 without this KIP, the ProduceRequests ahead of
>> LeaderAndISR
>> > > will
>> > > > be
>> > > > > > put into the purgatory,
>> > > > > >         and since they'll never be replicated to other brokers
>> > > (because
>> > > > > of
>> > > > > > the assumption made above), they will
>> > > > > >         be completed either when the LeaderAndISR request is
>> > > processed
>> > > > or
>> > > > > > when the timeout happens.
>> > > > > >   1.2 With this KIP, broker0 will immediately transition the
>> > > partition
>> > > > > > test-0 to become a follower,
>> > > > > >         after the current broker sees the replication of the
>> > > remaining
>> > > > 19
>> > > > > > partitions, it can send a response indicating that
>> > > > > >         it's no longer the leader for the "test-0".
>> > > > > >   To see the latency difference between 1.1 and 1.2, let's say
>> > there
>> > > > are
>> > > > > > 24K produce requests ahead of the LeaderAndISR, and there are 8
>> io
>> > > > > threads,
>> > > > > >   so each io thread will process approximately 3000 produce
>> > requests.
>> > > > Now
>> > > > > > let's investigate the io thread that finally processed the
>> > > > LeaderAndISR.
>> > > > > >   For the 3000 produce requests, if we model the time when their
>> > > > > remaining
>> > > > > > 19 partitions catch up as t0, t1, ...t2999, and the LeaderAndISR
>> > > > request
>> > > > > is
>> > > > > > processed at time t3000.
>> > > > > >   Without this KIP, the 1st produce request would have waited an
>> > > extra
>> > > > > > t3000 - t0 time in the purgatory, the 2nd an extra time of
>> t3000 -
>> > > t1,
>> > > > > etc.
>> > > > > >   Roughly speaking, the latency difference is bigger for the
>> > earlier
>> > > > > > produce requests than for the later ones. For the same reason,
>> the
>> > > more
>> > > > > > ProduceRequests queued
>> > > > > >   before the LeaderAndISR, the bigger benefit we get (capped by
>> the
>> > > > > > produce timeout).
>> > > > > > 2. If the enqueued produce requests have acks=0 or acks=1
>> > > > > >   There will be no latency differences in this case, but
>> > > > > >   2.1 without this KIP, the records of partition test-0 in the
>> > > > > > ProduceRequests ahead of the LeaderAndISR will be appended to
>> the
>> > > local
>> > > > > log,
>> > > > > >         and eventually be truncated after processing the
>> > > LeaderAndISR.
>> > > > > > This is what's referred to as
>> > > > > >         "some unofficial definition of data loss in terms of
>> > messages
>> > > > > > beyond the high watermark".
>> > > > > >   2.2 with this KIP, we can mitigate the effect since if the
>> > > > LeaderAndISR
>> > > > > > is immediately processed, the response to producers will have
>> > > > > >         the NotLeaderForPartition error, causing producers to
>> retry
>> > > > > >
>> > > > > > This explanation above is the benefit for reducing the latency
>> of a
>> > > > > broker
>> > > > > > becoming the follower,
>> > > > > > closely related is reducing the latency of a broker becoming the
>> > > > leader.
>> > > > > > In this case, the benefit is even more obvious, if other brokers
>> > have
>> > > > > > resigned leadership, and the
>> > > > > > current broker should take leadership. Any delay in processing
>> the
>> > > > > > LeaderAndISR will be perceived
>> > > > > > by clients as unavailability. In extreme cases, this can cause
>> > failed
>> > > > > > produce requests if the retries are
>> > > > > > exhausted.
>> > > > > >
>> > > > > > Another two types of controller requests are UpdateMetadata and
>> > > > > > StopReplica, which I'll briefly discuss as follows:
>> > > > > > For UpdateMetadata requests, delayed processing means clients
>> > > receiving
>> > > > > > stale metadata, e.g. with the wrong leadership info
>> > > > > > for certain partitions, and the effect is more retries or even
>> > fatal
>> > > > > > failure if the retries are exhausted.
>> > > > > >
>> > > > > > For StopReplica requests, a long queuing time may degrade the
>> > > > performance
>> > > > > > of topic deletion.
>> > > > > >
>> > > > > > Regarding your last question of the delay for
>> > DescribeLogDirsRequest,
>> > > > you
>> > > > > > are right
>> > > > > > that this KIP cannot help with the latency in getting the log
>> dirs
>> > > > info,
>> > > > > > and it's only relevant
>> > > > > > when controller requests are involved.
>> > > > > >
>> > > > > > Regards,
>> > > > > > Lucas
>> > > > > >
>> > > > > >
>> > > > > > On Tue, Jul 3, 2018 at 5:11 PM, Dong Lin <lindong28@gmail.com>
>> > > wrote:
>> > > > > >
>> > > > > >> Hey Jun,
>> > > > > >>
>> > > > > >> Thanks much for the comments. It is good point. So the feature
>> may
>> > > be
>> > > > > >> useful for JBOD use-case. I have one question below.
>> > > > > >>
>> > > > > >> Hey Lucas,
>> > > > > >>
>> > > > > >> Do you think this feature is also useful for non-JBOD setup or
>> it
>> > is
>> > > > > only
>> > > > > >> useful for the JBOD setup? It may be useful to understand this.
>> > > > > >>
>> > > > > >> When the broker is setup using JBOD, in order to move leaders
>> on
>> > the
>> > > > > >> failed
>> > > > > >> disk to other disks, the system operator first needs to get the
>> > list
>> > > > of
>> > > > > >> partitions on the failed disk. This is currently achieved using
>> > > > > >> AdminClient.describeLogDirs(), which sends
>> DescribeLogDirsRequest
>> > to
>> > > > the
>> > > > > >> broker. If we only prioritize the controller requests, then the
>> > > > > >> DescribeLogDirsRequest
>> > > > > >> may still take a long time to be processed by the broker. So
>> the
>> > > > overall
>> > > > > >> time to move leaders away from the failed disk may still be
>> long
>> > > even
>> > > > > with
>> > > > > >> this KIP. What do you think?
>> > > > > >>
>> > > > > >> Thanks,
>> > > > > >> Dong
>> > > > > >>
>> > > > > >>
>> > > > > >> On Tue, Jul 3, 2018 at 4:38 PM, Lucas Wang <
>> lucasatucla@gmail.com
>> > >
>> > > > > wrote:
>> > > > > >>
>> > > > > >> > Thanks for the insightful comment, Jun.
>> > > > > >> >
>> > > > > >> > @Dong,
>> > > > > >> > Since both of the two comments in your previous email are
>> about
>> > > the
>> > > > > >> > benefits of this KIP and whether it's useful,
>> > > > > >> > in light of Jun's last comment, do you agree that this KIP
>> can
>> > be
>> > > > > >> > beneficial in the case mentioned by Jun?
>> > > > > >> > Please let me know, thanks!
>> > > > > >> >
>> > > > > >> > Regards,
>> > > > > >> > Lucas
>> > > > > >> >
>> > > > > >> > On Tue, Jul 3, 2018 at 2:07 PM, Jun Rao <jun@confluent.io>
>> > wrote:
>> > > > > >> >
>> > > > > >> > > Hi, Lucas, Dong,
>> > > > > >> > >
>> > > > > >> > > If all disks on a broker are slow, one probably should just
>> > kill
>> > > > the
>> > > > > >> > > broker. In that case, this KIP may not help. If only one of
>> > the
>> > > > > disks
>> > > > > >> on
>> > > > > >> > a
>> > > > > >> > > broker is slow, one may want to fail that disk and move the
>> > > > leaders
>> > > > > on
>> > > > > >> > that
>> > > > > >> > > disk to other brokers. In that case, being able to process
>> the
>> > > > > >> > LeaderAndIsr
>> > > > > >> > > requests faster will potentially help the producers recover
>> > > > quicker.
>> > > > > >> > >
>> > > > > >> > > Thanks,
>> > > > > >> > >
>> > > > > >> > > Jun
>> > > > > >> > >
>> > > > > >> > > On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <
>> lindong28@gmail.com
>> > >
>> > > > > wrote:
>> > > > > >> > >
>> > > > > >> > > > Hey Lucas,
>> > > > > >> > > >
>> > > > > >> > > > Thanks for the reply. Some follow up questions below.
>> > > > > >> > > >
>> > > > > >> > > > Regarding 1, if each ProduceRequest covers 20 partitions
>> > that
>> > > > are
>> > > > > >> > > randomly
>> > > > > >> > > > distributed across all partitions, then each
>> ProduceRequest
>> > > will
>> > > > > >> likely
>> > > > > >> > > > cover some partitions for which the broker is still
>> leader
>> > > after
>> > > > > it
>> > > > > >> > > quickly
>> > > > > >> > > > processes the
>> > > > > >> > > > LeaderAndIsrRequest. Then broker will still be slow in
>> > > > processing
>> > > > > >> these
>> > > > > >> > > > ProduceRequest and request will still be very high with
>> this
>> > > > KIP.
>> > > > > It
>> > > > > >> > > seems
>> > > > > >> > > > that most ProduceRequest will still timeout after 30
>> > seconds.
>> > > Is
>> > > > > >> this
>> > > > > >> > > > understanding correct?
>> > > > > >> > > >
>> > > > > >> > > > Regarding 2, if most ProduceRequest will still timeout
>> after
>> > > 30
>> > > > > >> > seconds,
>> > > > > >> > > > then it is less clear how this KIP reduces average
>> produce
>> > > > > latency.
>> > > > > >> Can
>> > > > > >> > > you
>> > > > > >> > > > clarify what metrics can be improved by this KIP?
>> > > > > >> > > >
>> > > > > >> > > > Not sure why system operator directly cares number of
>> > > truncated
>> > > > > >> > messages.
>> > > > > >> > > > Do you mean this KIP can improve average throughput or
>> > reduce
>> > > > > >> message
>> > > > > >> > > > duplication? It will be good to understand this.
>> > > > > >> > > >
>> > > > > >> > > > Thanks,
>> > > > > >> > > > Dong
>> > > > > >> > > >
>> > > > > >> > > >
>> > > > > >> > > >
>> > > > > >> > > >
>> > > > > >> > > >
>> > > > > >> > > > On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <
>> > > lucasatucla@gmail.com
>> > > > >
>> > > > > >> > wrote:
>> > > > > >> > > >
>> > > > > >> > > > > Hi Dong,
>> > > > > >> > > > >
>> > > > > >> > > > > Thanks for your valuable comments. Please see my reply
>> > > below.
>> > > > > >> > > > >
>> > > > > >> > > > > 1. The Google doc showed only 1 partition. Now let's
>> > > consider
>> > > > a
>> > > > > >> more
>> > > > > >> > > > common
>> > > > > >> > > > > scenario
>> > > > > >> > > > > where broker0 is the leader of many partitions. And
>> let's
>> > > say
>> > > > > for
>> > > > > >> > some
>> > > > > >> > > > > reason its IO becomes slow.
>> > > > > >> > > > > The number of leader partitions on broker0 is so large,
>> > say
>> > > > 10K,
>> > > > > >> that
>> > > > > >> > > the
>> > > > > >> > > > > cluster is skewed,
>> > > > > >> > > > > and the operator would like to shift the leadership
>> for a
>> > > lot
>> > > > of
>> > > > > >> > > > > partitions, say 9K, to other brokers,
>> > > > > >> > > > > either manually or through some service like cruise
>> > control.
>> > > > > >> > > > > With this KIP, not only will the leadership transitions
>> > > finish
>> > > > > >> more
>> > > > > >> > > > > quickly, helping the cluster itself becoming more
>> > balanced,
>> > > > > >> > > > > but all existing producers corresponding to the 9K
>> > > partitions
>> > > > > will
>> > > > > >> > get
>> > > > > >> > > > the
>> > > > > >> > > > > errors relatively quickly
>> > > > > >> > > > > rather than relying on their timeout, thanks to the
>> > batched
>> > > > > async
>> > > > > >> ZK
>> > > > > >> > > > > operations.
>> > > > > >> > > > > To me it's a useful feature to have during such
>> > troublesome
>> > > > > times.
>> > > > > >> > > > >
>> > > > > >> > > > >
>> > > > > >> > > > > 2. The experiments in the Google Doc have shown that
>> with
>> > > this
>> > > > > KIP
>> > > > > >> > many
>> > > > > >> > > > > producers
>> > > > > >> > > > > receive an explicit error NotLeaderForPartition, based
>> on
>> > > > which
>> > > > > >> they
>> > > > > >> > > > retry
>> > > > > >> > > > > immediately.
>> > > > > >> > > > > Therefore the latency (~14 seconds+quick retry) for
>> their
>> > > > single
>> > > > > >> > > message
>> > > > > >> > > > is
>> > > > > >> > > > > much smaller
>> > > > > >> > > > > compared with the case of timing out without the KIP
>> (30
>> > > > seconds
>> > > > > >> for
>> > > > > >> > > > timing
>> > > > > >> > > > > out + quick retry).
>> > > > > >> > > > > One might argue that reducing the timing out on the
>> > producer
>> > > > > side
>> > > > > >> can
>> > > > > >> > > > > achieve the same result,
>> > > > > >> > > > > yet reducing the timeout has its own drawbacks[1].
>> > > > > >> > > > >
>> > > > > >> > > > > Also *IF* there were a metric to show the number of
>> > > truncated
>> > > > > >> > messages
>> > > > > >> > > on
>> > > > > >> > > > > brokers,
>> > > > > >> > > > > with the experiments done in the Google Doc, it should
>> be
>> > > easy
>> > > > > to
>> > > > > >> see
>> > > > > >> > > > that
>> > > > > >> > > > > a lot fewer messages need
>> > > > > >> > > > > to be truncated on broker0 since the up-to-date
>> metadata
>> > > > avoids
>> > > > > >> > > appending
>> > > > > >> > > > > of messages
>> > > > > >> > > > > in subsequent PRODUCE requests. If we talk to a system
>> > > > operator
>> > > > > >> and
>> > > > > >> > ask
>> > > > > >> > > > > whether
>> > > > > >> > > > > they prefer fewer wasteful IOs, I bet most likely the
>> > answer
>> > > > is
>> > > > > >> yes.
>> > > > > >> > > > >
>> > > > > >> > > > > 3. To answer your question, I think it might be
>> helpful to
>> > > > > >> construct
>> > > > > >> > > some
>> > > > > >> > > > > formulas.
>> > > > > >> > > > > To simplify the modeling, I'm going back to the case
>> where
>> > > > there
>> > > > > >> is
>> > > > > >> > > only
>> > > > > >> > > > > ONE partition involved.
>> > > > > >> > > > > Following the experiments in the Google Doc, let's say
>> > > broker0
>> > > > > >> > becomes
>> > > > > >> > > > the
>> > > > > >> > > > > follower at time t0,
>> > > > > >> > > > > and after t0 there were still N produce requests in its
>> > > > request
>> > > > > >> > queue.
>> > > > > >> > > > > With the up-to-date metadata brought by this KIP,
>> broker0
>> > > can
>> > > > > >> reply
>> > > > > >> > > with
>> > > > > >> > > > an
>> > > > > >> > > > > NotLeaderForPartition exception,
>> > > > > >> > > > > let's use M1 to denote the average processing time of
>> > > replying
>> > > > > >> with
>> > > > > >> > > such
>> > > > > >> > > > an
>> > > > > >> > > > > error message.
>> > > > > >> > > > > Without this KIP, the broker will need to append
>> messages
>> > to
>> > > > > >> > segments,
>> > > > > >> > > > > which may trigger a flush to disk,
>> > > > > >> > > > > let's use M2 to denote the average processing time for
>> > such
>> > > > > logic.
>> > > > > >> > > > > Then the average extra latency incurred without this
>> KIP
>> > is
>> > > N
>> > > > *
>> > > > > >> (M2 -
>> > > > > >> > > > M1) /
>> > > > > >> > > > > 2.
>> > > > > >> > > > >
>> > > > > >> > > > > In practice, M2 should always be larger than M1, which
>> > means
>> > > > as
>> > > > > >> long
>> > > > > >> > > as N
>> > > > > >> > > > > is positive,
>> > > > > >> > > > > we would see improvements on the average latency.
>> > > > > >> > > > > There does not need to be significant backlog of
>> requests
>> > in
>> > > > the
>> > > > > >> > > request
>> > > > > >> > > > > queue,
>> > > > > >> > > > > or severe degradation of disk performance to have the
>> > > > > improvement.
>> > > > > >> > > > >
>> > > > > >> > > > > Regards,
>> > > > > >> > > > > Lucas
>> > > > > >> > > > >
>> > > > > >> > > > >
>> > > > > >> > > > > [1] For instance, reducing the timeout on the producer
>> > side
>> > > > can
>> > > > > >> > trigger
>> > > > > >> > > > > unnecessary duplicate requests
>> > > > > >> > > > > when the corresponding leader broker is overloaded,
>> > > > exacerbating
>> > > > > >> the
>> > > > > >> > > > > situation.
>> > > > > >> > > > >
>> > > > > >> > > > > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <
>> > > lindong28@gmail.com
>> > > > >
>> > > > > >> > wrote:
>> > > > > >> > > > >
>> > > > > >> > > > > > Hey Lucas,
>> > > > > >> > > > > >
>> > > > > >> > > > > > Thanks much for the detailed documentation of the
>> > > > experiment.
>> > > > > >> > > > > >
>> > > > > >> > > > > > Initially I also think having a separate queue for
>> > > > controller
>> > > > > >> > > requests
>> > > > > >> > > > is
>> > > > > >> > > > > > useful because, as you mentioned in the summary
>> section
>> > of
>> > > > the
>> > > > > >> > Google
>> > > > > >> > > > > doc,
>> > > > > >> > > > > > controller requests are generally more important than
>> > data
>> > > > > >> requests
>> > > > > >> > > and
>> > > > > >> > > > > we
>> > > > > >> > > > > > probably want controller requests to be processed
>> > sooner.
>> > > > But
>> > > > > >> then
>> > > > > >> > > Eno
>> > > > > >> > > > > has
>> > > > > >> > > > > > two very good questions which I am not sure the
>> Google
>> > doc
>> > > > has
>> > > > > >> > > answered
>> > > > > >> > > > > > explicitly. Could you help with the following
>> questions?
>> > > > > >> > > > > >
>> > > > > >> > > > > > 1) It is not very clear what is the actual benefit of
>> > > > KIP-291
>> > > > > to
>> > > > > >> > > users.
>> > > > > >> > > > > The
>> > > > > >> > > > > > experiment setup in the Google doc simulates the
>> > scenario
>> > > > that
>> > > > > >> > broker
>> > > > > >> > > > is
>> > > > > >> > > > > > very slow handling ProduceRequest due to e.g. slow
>> disk.
>> > > It
>> > > > > >> > currently
>> > > > > >> > > > > > assumes that there is only 1 partition. But in the
>> > common
>> > > > > >> scenario,
>> > > > > >> > > it
>> > > > > >> > > > is
>> > > > > >> > > > > > probably reasonable to assume that there are many
>> other
>> > > > > >> partitions
>> > > > > >> > > that
>> > > > > >> > > > > are
>> > > > > >> > > > > > also actively produced to and ProduceRequest to these
>> > > > > partition
>> > > > > >> > also
>> > > > > >> > > > > takes
>> > > > > >> > > > > > e.g. 2 seconds to be processed. So even if broker0
>> can
>> > > > become
>> > > > > >> > > follower
>> > > > > >> > > > > for
>> > > > > >> > > > > > the partition 0 soon, it probably still needs to
>> process
>> > > the
>> > > > > >> > > > > ProduceRequest
>> > > > > >> > > > > > slowly t in the queue because these ProduceRequests
>> > cover
>> > > > > other
>> > > > > >> > > > > partitions.
>> > > > > >> > > > > > Thus most ProduceRequest will still timeout after 30
>> > > seconds
>> > > > > and
>> > > > > >> > most
>> > > > > >> > > > > > clients will still likely timeout after 30 seconds.
>> Then
>> > > it
>> > > > is
>> > > > > >> not
>> > > > > >> > > > > > obviously what is the benefit to client since client
>> > will
>> > > > > >> timeout
>> > > > > >> > > after
>> > > > > >> > > > > 30
>> > > > > >> > > > > > seconds before possibly re-connecting to broker1,
>> with
>> > or
>> > > > > >> without
>> > > > > >> > > > > KIP-291.
>> > > > > >> > > > > > Did I miss something here?
>> > > > > >> > > > > >
>> > > > > >> > > > > > 2) I guess Eno's is asking for the specific benefits
>> of
>> > > this
>> > > > > >> KIP to
>> > > > > >> > > > user
>> > > > > >> > > > > or
>> > > > > >> > > > > > system administrator, e.g. whether this KIP decreases
>> > > > average
>> > > > > >> > > latency,
>> > > > > >> > > > > > 999th percentile latency, probably of exception
>> exposed
>> > to
>> > > > > >> client
>> > > > > >> > > etc.
>> > > > > >> > > > It
>> > > > > >> > > > > > is probably useful to clarify this.
>> > > > > >> > > > > >
>> > > > > >> > > > > > 3) Does this KIP help improve user experience only
>> when
>> > > > there
>> > > > > is
>> > > > > >> > > issue
>> > > > > >> > > > > with
>> > > > > >> > > > > > broker, e.g. significant backlog in the request queue
>> > due
>> > > to
>> > > > > >> slow
>> > > > > >> > > disk
>> > > > > >> > > > as
>> > > > > >> > > > > > described in the Google doc? Or is this KIP also
>> useful
>> > > when
>> > > > > >> there
>> > > > > >> > is
>> > > > > >> > > > no
>> > > > > >> > > > > > ongoing issue in the cluster? It might be helpful to
>> > > clarify
>> > > > > >> this
>> > > > > >> > to
>> > > > > >> > > > > > understand the benefit of this KIP.
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > > Thanks much,
>> > > > > >> > > > > > Dong
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <
>> > > > > >> lucasatucla@gmail.com
>> > > > > >> > >
>> > > > > >> > > > > wrote:
>> > > > > >> > > > > >
>> > > > > >> > > > > > > Hi Eno,
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > Sorry for the delay in getting the experiment
>> results.
>> > > > > >> > > > > > > Here is a link to the positive impact achieved by
>> > > > > implementing
>> > > > > >> > the
>> > > > > >> > > > > > proposed
>> > > > > >> > > > > > > change:
>> > > > > >> > > > > > > https://docs.google.com/document/d/
>> > > > > 1ge2jjp5aPTBber6zaIT9AdhW
>> > > > > >> > > > > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
>> > > > > >> > > > > > > Please take a look when you have time and let me
>> know
>> > > your
>> > > > > >> > > feedback.
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > Regards,
>> > > > > >> > > > > > > Lucas
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <
>> > > kafka@harsha.io>
>> > > > > >> wrote:
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > > Thanks for the pointer. Will take a look might
>> suit
>> > > our
>> > > > > >> > > > requirements
>> > > > > >> > > > > > > > better.
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > > Thanks,
>> > > > > >> > > > > > > > Harsha
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
>> > > > > >> > > > lucasatucla@gmail.com
>> > > > > >> > > > > >
>> > > > > >> > > > > > > > wrote:
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > Hi Harsha,
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > If I understand correctly, the replication
>> quota
>> > > > > mechanism
>> > > > > >> > > > proposed
>> > > > > >> > > > > > in
>> > > > > >> > > > > > > > > KIP-73 can be helpful in that scenario.
>> > > > > >> > > > > > > > > Have you tried it out?
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > Thanks,
>> > > > > >> > > > > > > > > Lucas
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha <
>> > > > > kafka@harsha.io
>> > > > > >> >
>> > > > > >> > > > wrote:
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > Hi Lucas,
>> > > > > >> > > > > > > > > > One more question, any thoughts on making
>> this
>> > > > > >> configurable
>> > > > > >> > > > > > > > > > and also allowing subset of data requests to
>> be
>> > > > > >> > prioritized.
>> > > > > >> > > > For
>> > > > > >> > > > > > > > example
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > ,we notice in our cluster when we take out a
>> > > broker
>> > > > > and
>> > > > > >> > bring
>> > > > > >> > > > new
>> > > > > >> > > > > > one
>> > > > > >> > > > > > > > it
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > will try to become follower and have lot of
>> > fetch
>> > > > > >> requests
>> > > > > >> > to
>> > > > > >> > > > > other
>> > > > > >> > > > > > > > > leaders
>> > > > > >> > > > > > > > > > in clusters. This will negatively effect the
>> > > > > >> > > application/client
>> > > > > >> > > > > > > > > requests.
>> > > > > >> > > > > > > > > > We are also exploring the similar solution to
>> > > > > >> de-prioritize
>> > > > > >> > > if
>> > > > > >> > > > a
>> > > > > >> > > > > > new
>> > > > > >> > > > > > > > > > replica comes in for fetch requests, we are
>> ok
>> > > with
>> > > > > the
>> > > > > >> > > replica
>> > > > > >> > > > > to
>> > > > > >> > > > > > be
>> > > > > >> > > > > > > > > > taking time but the leaders should prioritize
>> > the
>> > > > > client
>> > > > > >> > > > > requests.
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > Thanks,
>> > > > > >> > > > > > > > > > Harsha
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang
>> > > wrote:
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > Hi Eno,
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > Sorry for the delayed response.
>> > > > > >> > > > > > > > > > > - I haven't implemented the feature yet,
>> so no
>> > > > > >> > experimental
>> > > > > >> > > > > > results
>> > > > > >> > > > > > > > so
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > > far.
>> > > > > >> > > > > > > > > > > And I plan to test in out in the following
>> > days.
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > - You are absolutely right that the
>> priority
>> > > queue
>> > > > > >> does
>> > > > > >> > not
>> > > > > >> > > > > > > > completely
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > > prevent
>> > > > > >> > > > > > > > > > > data requests being processed ahead of
>> > > controller
>> > > > > >> > requests.
>> > > > > >> > > > > > > > > > > That being said, I expect it to greatly
>> > mitigate
>> > > > the
>> > > > > >> > effect
>> > > > > >> > > > of
>> > > > > >> > > > > > > stable
>> > > > > >> > > > > > > > > > > metadata.
>> > > > > >> > > > > > > > > > > In any case, I'll try it out and post the
>> > > results
>> > > > > >> when I
>> > > > > >> > > have
>> > > > > >> > > > > it.
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > Regards,
>> > > > > >> > > > > > > > > > > Lucas
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno
>> Thereska
>> > <
>> > > > > >> > > > > > > > eno.thereska@gmail.com
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > Hi Lucas,
>> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > Sorry for the delay, just had a look at
>> > this.
>> > > A
>> > > > > >> couple
>> > > > > >> > of
>> > > > > >> > > > > > > > questions:
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > > > - did you notice any positive change
>> after
>> > > > > >> implementing
>> > > > > >> > > > this
>> > > > > >> > > > > > KIP?
>> > > > > >> > > > > > > > > I'm
>> > > > > >> > > > > > > > > > > > wondering if you have any experimental
>> > results
>> > > > > that
>> > > > > >> > show
>> > > > > >> > > > the
>> > > > > >> > > > > > > > benefit
>> > > > > >> > > > > > > > > of
>> > > > > >> > > > > > > > > > > the
>> > > > > >> > > > > > > > > > > > two queues.
>> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > - priority is usually not sufficient in
>> > > > addressing
>> > > > > >> the
>> > > > > >> > > > > problem
>> > > > > >> > > > > > > the
>> > > > > >> > > > > > > > > KIP
>> > > > > >> > > > > > > > > > > > identifies. Even with priority queues,
>> you
>> > > will
>> > > > > >> > sometimes
>> > > > > >> > > > > > > (often?)
>> > > > > >> > > > > > > > > have
>> > > > > >> > > > > > > > > > > the
>> > > > > >> > > > > > > > > > > > case that data plane requests will be
>> ahead
>> > of
>> > > > the
>> > > > > >> > > control
>> > > > > >> > > > > > plane
>> > > > > >> > > > > > > > > > > requests.
>> > > > > >> > > > > > > > > > > > This happens because the system might
>> have
>> > > > already
>> > > > > >> > > started
>> > > > > >> > > > > > > > > processing
>> > > > > >> > > > > > > > > > > the
>> > > > > >> > > > > > > > > > > > data plane requests before the control
>> plane
>> > > > ones
>> > > > > >> > > arrived.
>> > > > > >> > > > So
>> > > > > >> > > > > > it
>> > > > > >> > > > > > > > > would
>> > > > > >> > > > > > > > > > > be
>> > > > > >> > > > > > > > > > > > good to know what % of the problem this
>> KIP
>> > > > > >> addresses.
>> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > Thanks
>> > > > > >> > > > > > > > > > > > Eno
>> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
>> > > > > >> > > > > yuzhihong@gmail.com
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > Change looks good.
>> > > > > >> > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > Thanks
>> > > > > >> > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas
>> > Wang
>> > > <
>> > > > > >> > > > > > > > lucasatucla@gmail.com
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > Hi Ted,
>> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > Thanks for the suggestion. I've
>> updated
>> > > the
>> > > > > KIP.
>> > > > > >> > > Please
>> > > > > >> > > > > > take
>> > > > > >> > > > > > > > > > another
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > look.
>> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > Lucas
>> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted
>> Yu
>> > <
>> > > > > >> > > > > > > yuzhihong@gmail.com
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > Currently in KafkaConfig.scala :
>> > > > > >> > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > val QueuedMaxRequests = 500
>> > > > > >> > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > It would be good if you can include
>> > the
>> > > > > >> default
>> > > > > >> > > value
>> > > > > >> > > > > for
>> > > > > >> > > > > > > > this
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > new
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > config
>> > > > > >> > > > > > > > > > > > > > > in the KIP.
>> > > > > >> > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > Thanks
>> > > > > >> > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM,
>> Lucas
>> > > > Wang
>> > > > > <
>> > > > > >> > > > > > > > > > lucasatucla@gmail.com
>> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > Hi Ted, Dong
>> > > > > >> > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > I've updated the KIP by adding a
>> new
>> > > > > config,
>> > > > > >> > > > instead
>> > > > > >> > > > > of
>> > > > > >> > > > > > > > > reusing
>> > > > > >> > > > > > > > > > > the
>> > > > > >> > > > > > > > > > > > > > > > existing one.
>> > > > > >> > > > > > > > > > > > > > > > Please take another look when you
>> > have
>> > > > > time.
>> > > > > >> > > > Thanks a
>> > > > > >> > > > > > > lot!
>> > > > > >> > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > Lucas
>> > > > > >> > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM,
>> Ted
>> > > Yu
>> > > > <
>> > > > > >> > > > > > > > yuzhihong@gmail.com
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > bq. that's a waste of resource
>> if
>> > > > > control
>> > > > > >> > > request
>> > > > > >> > > > > > rate
>> > > > > >> > > > > > > is
>> > > > > >> > > > > > > > > low
>> > > > > >> > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > I don't know if control request
>> > rate
>> > > > can
>> > > > > >> get
>> > > > > >> > to
>> > > > > >> > > > > > > 100,000,
>> > > > > >> > > > > > > > > > > likely
>> > > > > >> > > > > > > > > > > > > not.
>> > > > > >> > > > > > > > > > > > > > > Then
>> > > > > >> > > > > > > > > > > > > > > > > using the same bound as that
>> for
>> > > data
>> > > > > >> > requests
>> > > > > >> > > > > seems
>> > > > > >> > > > > > > > high.
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13
>> PM,
>> > > > Lucas
>> > > > > >> Wang
>> > > > > >> > <
>> > > > > >> > > > > > > > > > > > > lucasatucla@gmail.com >
>> > > > > >> > > > > > > > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > Hi Ted,
>> > > > > >> > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > Thanks for taking a look at
>> this
>> > > > KIP.
>> > > > > >> > > > > > > > > > > > > > > > > > Let's say today the setting
>> of
>> > > > > >> > > > > > "queued.max.requests"
>> > > > > >> > > > > > > in
>> > > > > >> > > > > > > > > > > > cluster A
>> > > > > >> > > > > > > > > > > > > > is
>> > > > > >> > > > > > > > > > > > > > > > > 1000,
>> > > > > >> > > > > > > > > > > > > > > > > > while the setting in cluster
>> B
>> > is
>> > > > > >> 100,000.
>> > > > > >> > > > > > > > > > > > > > > > > > The 100 times difference
>> might
>> > > have
>> > > > > >> > indicated
>> > > > > >> > > > > that
>> > > > > >> > > > > > > > > machines
>> > > > > >> > > > > > > > > > > in
>> > > > > >> > > > > > > > > > > > > > > cluster
>> > > > > >> > > > > > > > > > > > > > > > B
>> > > > > >> > > > > > > > > > > > > > > > > > have larger memory.
>> > > > > >> > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > By reusing the
>> > > > "queued.max.requests",
>> > > > > >> the
>> > > > > >> > > > > > > > > > > controlRequestQueue
>> > > > > >> > > > > > > > > > > > in
>> > > > > >> > > > > > > > > > > > > > > > cluster
>> > > > > >> > > > > > > > > > > > > > > > > B
>> > > > > >> > > > > > > > > > > > > > > > > > automatically
>> > > > > >> > > > > > > > > > > > > > > > > > gets a 100x capacity without
>> > > > > explicitly
>> > > > > >> > > > bothering
>> > > > > >> > > > > > the
>> > > > > >> > > > > > > > > > > > operators.
>> > > > > >> > > > > > > > > > > > > > > > > > I understand the counter
>> > argument
>> > > > can
>> > > > > be
>> > > > > >> > that
>> > > > > >> > > > > maybe
>> > > > > >> > > > > > > > > that's
>> > > > > >> > > > > > > > > > a
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > waste
>> > > > > >> > > > > > > > > > > > > > of
>> > > > > >> > > > > > > > > > > > > > > > > > resource if control request
>> > > > > >> > > > > > > > > > > > > > > > > > rate is low and operators may
>> > want
>> > > > to
>> > > > > >> fine
>> > > > > >> > > tune
>> > > > > >> > > > > the
>> > > > > >> > > > > > > > > > capacity
>> > > > > >> > > > > > > > > > > of
>> > > > > >> > > > > > > > > > > > > the
>> > > > > >> > > > > > > > > > > > > > > > > > controlRequestQueue.
>> > > > > >> > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > I'm ok with either approach,
>> and
>> > > can
>> > > > > >> change
>> > > > > >> > > it
>> > > > > >> > > > if
>> > > > > >> > > > > > you
>> > > > > >> > > > > > > > or
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > > anyone
>> > > > > >> > > > > > > > > > > > > > else
>> > > > > >> > > > > > > > > > > > > > > > > feels
>> > > > > >> > > > > > > > > > > > > > > > > > strong about adding the extra
>> > > > config.
>> > > > > >> > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > Thanks,
>> > > > > >> > > > > > > > > > > > > > > > > > Lucas
>> > > > > >> > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11
>> PM,
>> > > Ted
>> > > > > Yu
>> > > > > >> <
>> > > > > >> > > > > > > > > > yuzhihong@gmail.com
>> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > Lucas:
>> > > > > >> > > > > > > > > > > > > > > > > > > Under Rejected
>> Alternatives,
>> > #2,
>> > > > can
>> > > > > >> you
>> > > > > >> > > > > > elaborate
>> > > > > >> > > > > > > a
>> > > > > >> > > > > > > > > bit
>> > > > > >> > > > > > > > > > > more
>> > > > > >> > > > > > > > > > > > > on
>> > > > > >> > > > > > > > > > > > > > > why
>> > > > > >> > > > > > > > > > > > > > > > > the
>> > > > > >> > > > > > > > > > > > > > > > > > > separate config has bigger
>> > > impact
>> > > > ?
>> > > > > >> > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > Thanks
>> > > > > >> > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at
>> 2:00
>> > PM,
>> > > > > Dong
>> > > > > >> > Lin <
>> > > > > >> > > > > > > > > > > > lindong28@gmail.com
>> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > Hey Luca,
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks for the KIP. Looks
>> > good
>> > > > > >> overall.
>> > > > > >> > > > Some
>> > > > > >> > > > > > > > > comments
>> > > > > >> > > > > > > > > > > > below:
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > - We usually specify the
>> > full
>> > > > > mbean
>> > > > > >> for
>> > > > > >> > > the
>> > > > > >> > > > > new
>> > > > > >> > > > > > > > > metrics
>> > > > > >> > > > > > > > > > > in
>> > > > > >> > > > > > > > > > > > > the
>> > > > > >> > > > > > > > > > > > > > > KIP.
>> > > > > >> > > > > > > > > > > > > > > > > Can
>> > > > > >> > > > > > > > > > > > > > > > > > > you
>> > > > > >> > > > > > > > > > > > > > > > > > > > specify it in the Public
>> > > > Interface
>> > > > > >> > > section
>> > > > > >> > > > > > > similar
>> > > > > >> > > > > > > > > to
>> > > > > >> > > > > > > > > > > > KIP-237
>> > > > > >> > > > > > > > > > > > > > > > > > > > <
>> https://cwiki.apache.org/
>> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > 237%3A+More+Controller+Health+
>> > > > > >> Metrics>
>> > > > > >> > > > > > > > > > > > > > > > > > > > ?
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > - Maybe we could follow
>> the
>> > > same
>> > > > > >> > pattern
>> > > > > >> > > as
>> > > > > >> > > > > > > KIP-153
>> > > > > >> > > > > > > > > > > > > > > > > > > > <
>> https://cwiki.apache.org/
>> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
>> > > > > >> > > > > > > > > > > > > metric>,
>> > > > > >> > > > > > > > > > > > > > > > > > > > where we keep the
>> existing
>> > > > sensor
>> > > > > >> name
>> > > > > >> > > > > > > > > "BytesInPerSec"
>> > > > > >> > > > > > > > > > > and
>> > > > > >> > > > > > > > > > > > > add
>> > > > > >> > > > > > > > > > > > > > a
>> > > > > >> > > > > > > > > > > > > > > > new
>> > > > > >> > > > > > > > > > > > > > > > > > > sensor
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> "ReplicationBytesInPerSec",
>> > > > rather
>> > > > > >> than
>> > > > > >> > > > > > replacing
>> > > > > >> > > > > > > > > the
>> > > > > >> > > > > > > > > > > > sensor
>> > > > > >> > > > > > > > > > > > > > > name "
>> > > > > >> > > > > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
>> > > > > >> > > > > "ClientBytesInPerSec".
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > - It seems that the KIP
>> > > changes
>> > > > > the
>> > > > > >> > > > semantics
>> > > > > >> > > > > > of
>> > > > > >> > > > > > > > the
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > > broker
>> > > > > >> > > > > > > > > > > > > > > config
>> > > > > >> > > > > > > > > > > > > > > > > > > > "queued.max.requests"
>> > because
>> > > > the
>> > > > > >> > number
>> > > > > >> > > of
>> > > > > >> > > > > > total
>> > > > > >> > > > > > > > > > > requests
>> > > > > >> > > > > > > > > > > > > > queued
>> > > > > >> > > > > > > > > > > > > > > > in
>> > > > > >> > > > > > > > > > > > > > > > > > the
>> > > > > >> > > > > > > > > > > > > > > > > > > > broker will be no longer
>> > > bounded
>> > > > > by
>> > > > > >> > > > > > > > > > > "queued.max.requests".
>> > > > > >> > > > > > > > > > > > > This
>> > > > > >> > > > > > > > > > > > > > > > > > probably
>> > > > > >> > > > > > > > > > > > > > > > > > > > needs to be specified in
>> the
>> > > > > Public
>> > > > > >> > > > > Interfaces
>> > > > > >> > > > > > > > > section
>> > > > > >> > > > > > > > > > > for
>> > > > > >> > > > > > > > > > > > > > > > > discussion.
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > Thanks,
>> > > > > >> > > > > > > > > > > > > > > > > > > > Dong
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at
>> > 12:45
>> > > > PM,
>> > > > > >> Lucas
>> > > > > >> > > > Wang
>> > > > > >> > > > > <
>> > > > > >> > > > > > > > > > > > > > > > lucasatucla@gmail.com >
>> > > > > >> > > > > > > > > > > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > > Hi Kafka experts,
>> > > > > >> > > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > > I created KIP-291 to
>> add a
>> > > > > >> separate
>> > > > > >> > > queue
>> > > > > >> > > > > for
>> > > > > >> > > > > > > > > > > controller
>> > > > > >> > > > > > > > > > > > > > > > requests:
>> > > > > >> > > > > > > > > > > > > > > > > > > > >
>> https://cwiki.apache.org/
>> > > > > >> > > > > > > > > > confluence/display/KAFKA/KIP-
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > 291%
>> > > > > >> > > > > > > > > > > > > > > > > > > > >
>> > 3A+Have+separate+queues+for+
>> > > > > >> > > > > > > > > > control+requests+and+data+
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > requests
>> > > > > >> > > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > > Can you please take a
>> look
>> > > and
>> > > > > >> let me
>> > > > > >> > > > know
>> > > > > >> > > > > > your
>> > > > > >> > > > > > > > > > > feedback?
>> > > > > >> > > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > > > Thanks a lot for your
>> > time!
>> > > > > >> > > > > > > > > > > > > > > > > > > > > Regards,
>> > > > > >> > > > > > > > > > > > > > > > > > > > > Lucas
>> > > > > >> > > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > > >
>> > > > > >> > > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > >
>> > > > > >> > > >
>> > > > > >> > >
>> > > > > >> >
>> > > > > >>
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> >
>> > --
>> > -Regards,
>> > Mayuresh R. Gharat
>> > (862) 250-7125
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message