kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <...@confluent.io>
Subject Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests
Date Tue, 03 Jul 2018 21:07:24 GMT
Hi, Lucas, Dong,

If all disks on a broker are slow, one probably should just kill the
broker. In that case, this KIP may not help. If only one of the disks on a
broker is slow, one may want to fail that disk and move the leaders on that
disk to other brokers. In that case, being able to process the LeaderAndIsr
requests faster will potentially help the producers recover quicker.

Thanks,

Jun

On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <lindong28@gmail.com> wrote:

> Hey Lucas,
>
> Thanks for the reply. Some follow up questions below.
>
> Regarding 1, if each ProduceRequest covers 20 partitions that are randomly
> distributed across all partitions, then each ProduceRequest will likely
> cover some partitions for which the broker is still leader after it quickly
> processes the
> LeaderAndIsrRequest. Then broker will still be slow in processing these
> ProduceRequest and request will still be very high with this KIP. It seems
> that most ProduceRequest will still timeout after 30 seconds. Is this
> understanding correct?
>
> Regarding 2, if most ProduceRequest will still timeout after 30 seconds,
> then it is less clear how this KIP reduces average produce latency. Can you
> clarify what metrics can be improved by this KIP?
>
> Not sure why system operator directly cares number of truncated messages.
> Do you mean this KIP can improve average throughput or reduce message
> duplication? It will be good to understand this.
>
> Thanks,
> Dong
>
>
>
>
>
> On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <lucasatucla@gmail.com> wrote:
>
> > Hi Dong,
> >
> > Thanks for your valuable comments. Please see my reply below.
> >
> > 1. The Google doc showed only 1 partition. Now let's consider a more
> common
> > scenario
> > where broker0 is the leader of many partitions. And let's say for some
> > reason its IO becomes slow.
> > The number of leader partitions on broker0 is so large, say 10K, that the
> > cluster is skewed,
> > and the operator would like to shift the leadership for a lot of
> > partitions, say 9K, to other brokers,
> > either manually or through some service like cruise control.
> > With this KIP, not only will the leadership transitions finish more
> > quickly, helping the cluster itself becoming more balanced,
> > but all existing producers corresponding to the 9K partitions will get
> the
> > errors relatively quickly
> > rather than relying on their timeout, thanks to the batched async ZK
> > operations.
> > To me it's a useful feature to have during such troublesome times.
> >
> >
> > 2. The experiments in the Google Doc have shown that with this KIP many
> > producers
> > receive an explicit error NotLeaderForPartition, based on which they
> retry
> > immediately.
> > Therefore the latency (~14 seconds+quick retry) for their single message
> is
> > much smaller
> > compared with the case of timing out without the KIP (30 seconds for
> timing
> > out + quick retry).
> > One might argue that reducing the timing out on the producer side can
> > achieve the same result,
> > yet reducing the timeout has its own drawbacks[1].
> >
> > Also *IF* there were a metric to show the number of truncated messages on
> > brokers,
> > with the experiments done in the Google Doc, it should be easy to see
> that
> > a lot fewer messages need
> > to be truncated on broker0 since the up-to-date metadata avoids appending
> > of messages
> > in subsequent PRODUCE requests. If we talk to a system operator and ask
> > whether
> > they prefer fewer wasteful IOs, I bet most likely the answer is yes.
> >
> > 3. To answer your question, I think it might be helpful to construct some
> > formulas.
> > To simplify the modeling, I'm going back to the case where there is only
> > ONE partition involved.
> > Following the experiments in the Google Doc, let's say broker0 becomes
> the
> > follower at time t0,
> > and after t0 there were still N produce requests in its request queue.
> > With the up-to-date metadata brought by this KIP, broker0 can reply with
> an
> > NotLeaderForPartition exception,
> > let's use M1 to denote the average processing time of replying with such
> an
> > error message.
> > Without this KIP, the broker will need to append messages to segments,
> > which may trigger a flush to disk,
> > let's use M2 to denote the average processing time for such logic.
> > Then the average extra latency incurred without this KIP is N * (M2 -
> M1) /
> > 2.
> >
> > In practice, M2 should always be larger than M1, which means as long as N
> > is positive,
> > we would see improvements on the average latency.
> > There does not need to be significant backlog of requests in the request
> > queue,
> > or severe degradation of disk performance to have the improvement.
> >
> > Regards,
> > Lucas
> >
> >
> > [1] For instance, reducing the timeout on the producer side can trigger
> > unnecessary duplicate requests
> > when the corresponding leader broker is overloaded, exacerbating the
> > situation.
> >
> > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <lindong28@gmail.com> wrote:
> >
> > > Hey Lucas,
> > >
> > > Thanks much for the detailed documentation of the experiment.
> > >
> > > Initially I also think having a separate queue for controller requests
> is
> > > useful because, as you mentioned in the summary section of the Google
> > doc,
> > > controller requests are generally more important than data requests and
> > we
> > > probably want controller requests to be processed sooner. But then Eno
> > has
> > > two very good questions which I am not sure the Google doc has answered
> > > explicitly. Could you help with the following questions?
> > >
> > > 1) It is not very clear what is the actual benefit of KIP-291 to users.
> > The
> > > experiment setup in the Google doc simulates the scenario that broker
> is
> > > very slow handling ProduceRequest due to e.g. slow disk. It currently
> > > assumes that there is only 1 partition. But in the common scenario, it
> is
> > > probably reasonable to assume that there are many other partitions that
> > are
> > > also actively produced to and ProduceRequest to these partition also
> > takes
> > > e.g. 2 seconds to be processed. So even if broker0 can become follower
> > for
> > > the partition 0 soon, it probably still needs to process the
> > ProduceRequest
> > > slowly t in the queue because these ProduceRequests cover other
> > partitions.
> > > Thus most ProduceRequest will still timeout after 30 seconds and most
> > > clients will still likely timeout after 30 seconds. Then it is not
> > > obviously what is the benefit to client since client will timeout after
> > 30
> > > seconds before possibly re-connecting to broker1, with or without
> > KIP-291.
> > > Did I miss something here?
> > >
> > > 2) I guess Eno's is asking for the specific benefits of this KIP to
> user
> > or
> > > system administrator, e.g. whether this KIP decreases average latency,
> > > 999th percentile latency, probably of exception exposed to client etc.
> It
> > > is probably useful to clarify this.
> > >
> > > 3) Does this KIP help improve user experience only when there is issue
> > with
> > > broker, e.g. significant backlog in the request queue due to slow disk
> as
> > > described in the Google doc? Or is this KIP also useful when there is
> no
> > > ongoing issue in the cluster? It might be helpful to clarify this to
> > > understand the benefit of this KIP.
> > >
> > >
> > > Thanks much,
> > > Dong
> > >
> > >
> > >
> > >
> > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <lucasatucla@gmail.com>
> > wrote:
> > >
> > > > Hi Eno,
> > > >
> > > > Sorry for the delay in getting the experiment results.
> > > > Here is a link to the positive impact achieved by implementing the
> > > proposed
> > > > change:
> > > > https://docs.google.com/document/d/1ge2jjp5aPTBber6zaIT9AdhW
> > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > Please take a look when you have time and let me know your feedback.
> > > >
> > > > Regards,
> > > > Lucas
> > > >
> > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <kafka@harsha.io> wrote:
> > > >
> > > > > Thanks for the pointer. Will take a look might suit our
> requirements
> > > > > better.
> > > > >
> > > > > Thanks,
> > > > > Harsha
> > > > >
> > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
> lucasatucla@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hi Harsha,
> > > > > >
> > > > > > If I understand correctly, the replication quota mechanism
> proposed
> > > in
> > > > > > KIP-73 can be helpful in that scenario.
> > > > > > Have you tried it out?
> > > > > >
> > > > > > Thanks,
> > > > > > Lucas
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha < kafka@harsha.io
>
> wrote:
> > > > > >
> > > > > > > Hi Lucas,
> > > > > > > One more question, any thoughts on making this configurable
> > > > > > > and also allowing subset of data requests to be prioritized.
> For
> > > > > example
> > > > > >
> > > > > > > ,we notice in our cluster when we take out a broker and
bring
> new
> > > one
> > > > > it
> > > > > >
> > > > > > > will try to become follower and have lot of fetch requests
to
> > other
> > > > > > leaders
> > > > > > > in clusters. This will negatively effect the application/client
> > > > > > requests.
> > > > > > > We are also exploring the similar solution to de-prioritize
if
> a
> > > new
> > > > > > > replica comes in for fetch requests, we are ok with the
replica
> > to
> > > be
> > > > > > > taking time but the leaders should prioritize the client
> > requests.
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Harsha
> > > > > > >
> > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Hi Eno,
> > > > > > > >
> > > > > > > > Sorry for the delayed response.
> > > > > > > > - I haven't implemented the feature yet, so no experimental
> > > results
> > > > > so
> > > > > >
> > > > > > > > far.
> > > > > > > > And I plan to test in out in the following days.
> > > > > > > >
> > > > > > > > - You are absolutely right that the priority queue
does not
> > > > > completely
> > > > > >
> > > > > > > > prevent
> > > > > > > > data requests being processed ahead of controller
requests.
> > > > > > > > That being said, I expect it to greatly mitigate the
effect
> of
> > > > stable
> > > > > > > > metadata.
> > > > > > > > In any case, I'll try it out and post the results
when I have
> > it.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Lucas
> > > > > > > >
> > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
> > > > > eno.thereska@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Lucas,
> > > > > > > > >
> > > > > > > > > Sorry for the delay, just had a look at this.
A couple of
> > > > > questions:
> > > > > >
> > > > > > > > > - did you notice any positive change after implementing
> this
> > > KIP?
> > > > > > I'm
> > > > > > > > > wondering if you have any experimental results
that show
> the
> > > > > benefit
> > > > > > of
> > > > > > > > the
> > > > > > > > > two queues.
> > > > > > > > >
> > > > > > > > > - priority is usually not sufficient in addressing
the
> > problem
> > > > the
> > > > > > KIP
> > > > > > > > > identifies. Even with priority queues, you will
sometimes
> > > > (often?)
> > > > > > have
> > > > > > > > the
> > > > > > > > > case that data plane requests will be ahead of
the control
> > > plane
> > > > > > > > requests.
> > > > > > > > > This happens because the system might have already
started
> > > > > > processing
> > > > > > > > the
> > > > > > > > > data plane requests before the control plane
ones arrived.
> So
> > > it
> > > > > > would
> > > > > > > > be
> > > > > > > > > good to know what % of the problem this KIP addresses.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Eno
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
> > yuzhihong@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Change looks good.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang
<
> > > > > lucasatucla@gmail.com
> > > > > >
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Ted,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the suggestion. I've updated
the KIP. Please
> > > take
> > > > > > > another
> > > > > > > >
> > > > > > > > > > look.
> > > > > > > > > > >
> > > > > > > > > > > Lucas
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted
Yu <
> > > > yuzhihong@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Currently in KafkaConfig.scala
:
> > > > > > > > > > > >
> > > > > > > > > > > > val QueuedMaxRequests = 500
> > > > > > > > > > > >
> > > > > > > > > > > > It would be good if you can include
the default value
> > for
> > > > > this
> > > > > >
> > > > > > > new
> > > > > > > >
> > > > > > > > > > config
> > > > > > > > > > > > in the KIP.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM,
Lucas Wang <
> > > > > > > lucasatucla@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Ted, Dong
> > > > > > > > > > > > >
> > > > > > > > > > > > > I've updated the KIP by adding
a new config,
> instead
> > of
> > > > > > reusing
> > > > > > > > the
> > > > > > > > > > > > > existing one.
> > > > > > > > > > > > > Please take another look
when you have time.
> Thanks a
> > > > lot!
> > > > > > > > > > > > >
> > > > > > > > > > > > > Lucas
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33
PM, Ted Yu <
> > > > > yuzhihong@gmail.com
> > > > > >
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > bq. that's a waste of
resource if control request
> > > rate
> > > > is
> > > > > > low
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I don't know if control
request rate can get to
> > > > 100,000,
> > > > > > > > likely
> > > > > > > > > > not.
> > > > > > > > > > > > Then
> > > > > > > > > > > > > > using the same bound
as that for data requests
> > seems
> > > > > high.
> > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Jun 13, 2018
at 10:13 PM, Lucas Wang <
> > > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Ted,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for taking
a look at this KIP.
> > > > > > > > > > > > > > > Let's say today
the setting of
> > > "queued.max.requests"
> > > > in
> > > > > > > > > cluster A
> > > > > > > > > > > is
> > > > > > > > > > > > > > 1000,
> > > > > > > > > > > > > > > while the setting
in cluster B is 100,000.
> > > > > > > > > > > > > > > The 100 times difference
might have indicated
> > that
> > > > > > machines
> > > > > > > > in
> > > > > > > > > > > > cluster
> > > > > > > > > > > > > B
> > > > > > > > > > > > > > > have larger memory.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > By reusing the
"queued.max.requests", the
> > > > > > > > controlRequestQueue
> > > > > > > > > in
> > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > B
> > > > > > > > > > > > > > > automatically
> > > > > > > > > > > > > > > gets a 100x capacity
without explicitly
> bothering
> > > the
> > > > > > > > > operators.
> > > > > > > > > > > > > > > I understand the
counter argument can be that
> > maybe
> > > > > > that's
> > > > > > > a
> > > > > > > >
> > > > > > > > > > waste
> > > > > > > > > > > of
> > > > > > > > > > > > > > > resource if control
request
> > > > > > > > > > > > > > > rate is low and
operators may want to fine tune
> > the
> > > > > > > capacity
> > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > > > > > > controlRequestQueue.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'm ok with either
approach, and can change it
> if
> > > you
> > > > > or
> > > > > >
> > > > > > > > anyone
> > > > > > > > > > > else
> > > > > > > > > > > > > > feels
> > > > > > > > > > > > > > > strong about adding
the extra config.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Jun 13,
2018 at 3:11 PM, Ted Yu <
> > > > > > > yuzhihong@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Lucas:
> > > > > > > > > > > > > > > > Under Rejected
Alternatives, #2, can you
> > > elaborate
> > > > a
> > > > > > bit
> > > > > > > > more
> > > > > > > > > > on
> > > > > > > > > > > > why
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > separate config
has bigger impact ?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Jun
13, 2018 at 2:00 PM, Dong Lin <
> > > > > > > > > lindong28@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hey Luca,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks
for the KIP. Looks good overall.
> Some
> > > > > > comments
> > > > > > > > > below:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - We
usually specify the full mbean for the
> > new
> > > > > > metrics
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > KIP.
> > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > specify
it in the Public Interface section
> > > > similar
> > > > > > to
> > > > > > > > > KIP-237
> > > > > > > > > > > > > > > > > <
https://cwiki.apache.org/
> > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > >
> > > > > > > > > > > > > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > > > > > > > > > > > > ?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Maybe
we could follow the same pattern as
> > > > KIP-153
> > > > > > > > > > > > > > > > > <
https://cwiki.apache.org/
> > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > > > > > > > metric>,
> > > > > > > > > > > > > > > > > where
we keep the existing sensor name
> > > > > > "BytesInPerSec"
> > > > > > > > and
> > > > > > > > > > add
> > > > > > > > > > > a
> > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > sensor
> > > > > > > > > > > > > > > > > "ReplicationBytesInPerSec",
rather than
> > > replacing
> > > > > > the
> > > > > > > > > sensor
> > > > > > > > > > > > name "
> > > > > > > > > > > > > > > > > BytesInPerSec"
with e.g.
> > "ClientBytesInPerSec".
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - It
seems that the KIP changes the
> semantics
> > > of
> > > > > the
> > > > > >
> > > > > > > > broker
> > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > "queued.max.requests"
because the number of
> > > total
> > > > > > > > requests
> > > > > > > > > > > queued
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > broker
will be no longer bounded by
> > > > > > > > "queued.max.requests".
> > > > > > > > > > This
> > > > > > > > > > > > > > > probably
> > > > > > > > > > > > > > > > > needs
to be specified in the Public
> > Interfaces
> > > > > > section
> > > > > > > > for
> > > > > > > > > > > > > > discussion.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > Dong
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed,
Jun 13, 2018 at 12:45 PM, Lucas
> Wang
> > <
> > > > > > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
Hi Kafka experts,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
I created KIP-291 to add a separate queue
> > for
> > > > > > > > controller
> > > > > > > > > > > > > requests:
> > > > > > > > > > > > > > > > > >
https://cwiki.apache.org/
> > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > >
> > > > > > > > > 291%
> > > > > > > > > > > > > > > > > >
3A+Have+separate+queues+for+
> > > > > > > control+requests+and+data+
> > > > > > > >
> > > > > > > > > > > requests
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
Can you please take a look and let me
> know
> > > your
> > > > > > > > feedback?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
Thanks a lot for your time!
> > > > > > > > > > > > > > > > > >
Regards,
> > > > > > > > > > > > > > > > > >
Lucas
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message