From dev-return-95770-archive-asf-public=cust-asf.ponee.io@kafka.apache.org  Tue Jul  3 23:07:41 2018
Return-Path: <dev-return-95770-archive-asf-public=cust-asf.ponee.io@kafka.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 6D64A180632
	for <archive-asf-public@cust-asf.ponee.io>; Tue,  3 Jul 2018 23:07:40 +0200 (CEST)
Received: (qmail 77874 invoked by uid 500); 3 Jul 2018 21:07:39 -0000
Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@kafka.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@kafka.apache.org>
List-Post: <mailto:dev@kafka.apache.org>
List-Id: <dev.kafka.apache.org>
Reply-To: dev@kafka.apache.org
Delivered-To: mailing list dev@kafka.apache.org
Received: (qmail 77860 invoked by uid 99); 3 Jul 2018 21:07:38 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jul 2018 21:07:38 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E8295D2076
	for <dev@kafka.apache.org>; Tue,  3 Jul 2018 21:07:37 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.969
X-Spam-Level: *
X-Spam-Status: No, score=1.969 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2,
	RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01,
	RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01]
	autolearn=disabled
Authentication-Results: spamd1-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=confluent-io.20150623.gappssmtp.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024)
	with ESMTP id A8ZPLxXgC7wO for <dev@kafka.apache.org>;
	Tue,  3 Jul 2018 21:07:33 +0000 (UTC)
Received: from mail-yw0-f181.google.com (mail-yw0-f181.google.com [209.85.161.181])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id E80405F366
	for <dev@kafka.apache.org>; Tue,  3 Jul 2018 21:07:32 +0000 (UTC)
Received: by mail-yw0-f181.google.com with SMTP id t18-v6so1199605ywg.2
        for <dev@kafka.apache.org>; Tue, 03 Jul 2018 14:07:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=confluent-io.20150623.gappssmtp.com; s=20150623;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to;
        bh=01RitPfqFF6WSOXb5aiiBgTKKdZvsTnIcSRWvfcxQFM=;
        b=A3ZU4/lKp1T73zzwJDIjlfG/hzmBAB5wNNWFXtZrpOBtOLwdgOR7Iwjikc133C5TfF
         3X/ro0CXqI+cxl9svNrPbJKuYlavSg42SRVSEflCsN/jq2wDUep772ArU70an/9e/Hwl
         +EEWAez4RAg73ktNhejeiydczpDIdMbKzMs2AnMP08MUtS/xu8G+KTfwl1chuafcKrSF
         LfSVaEx+p7MHqQyKazVizr1lRM1i2nAlpgsm+SABD3XPBIHCa2SNXaZnT5xKrd3SOBXK
         +/hTGfTJ2SQIw62x+XqXyfhy29zV8Xr8XplsleY0Z/I7sRF31pkp0KmVKeOikdVctu79
         eC1Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to;
        bh=01RitPfqFF6WSOXb5aiiBgTKKdZvsTnIcSRWvfcxQFM=;
        b=Fv1ihAR/kaRKOX4+cxUXLNRop3zW0Q4f8D+MXs1rU6eWc9RH+uifNimIdITsFLmqlb
         aZbQv+XPJl9ngusC3+zhlFGyBAxbe7Oftj/7WtKc3PByBynuQcoeMzOx++R1dem0AS10
         GDxTYgUoXyRWJZ7FwnStg1Ee3/75OFJVnFHCUB94TV5i9r5O5Gmvk9N6IH/w6C+hHf4F
         +ryACtyoEu+fJwGLgoDSPzt1e81qbiqmTP0ftCijXsje+rEo7GqSia8leqrH0sPzreCH
         2JNmnvphVmjcewCo27cjMzGYtf1li3LlQ1OYllGaJYJ4RbnqN+01Sog/RvVEfoXhtVSk
         23dw==
X-Gm-Message-State: APt69E0dcIf4ROPL1FR/tsDqgJ+QUTg3MV51/1o5NA3w5mH5utpS+Kbn
	Dxyxb8GRQqHBnksxEyxJfVm4QmwWoqJ5iTdEtVlNpHqQ
X-Google-Smtp-Source: AAOMgpdXHoYGxmlrnc6Mjppe4mfUpWYsDjRWdEKETW1XhrzWKbVeCYms/r67Jl70/6yMo0FJPn24h+F9pjxF/pIJ0PI=
X-Received: by 2002:a81:5304:: with SMTP id h4-v6mr15792007ywb.350.1530652045625;
 Tue, 03 Jul 2018 14:07:25 -0700 (PDT)
MIME-Version: 1.0
Received: by 2002:a25:7102:0:0:0:0:0 with HTTP; Tue, 3 Jul 2018 14:07:24 -0700 (PDT)
In-Reply-To: <CAAaarBY0bqx5CHqo-FtH2MCePtmSNpFfr1yjvLxuyw4XrDajeQ@mail.gmail.com>
References: <CAMT-Mpyk02g=hbEye6YLuo7tt9TJWx9Ya4f7UO0gw82CGdcurg@mail.gmail.com>
 <5b326ef6a561d731a7000002@polymail.io> <CAMT-Mpz3JOaD2aMEgW6eSz8BGrMQfYjmCq-_Pppa6bApq6X6HQ@mail.gmail.com>
 <CAAaarBZxfVeFkxB3StgaeJbxkjPN-gzccbdjxtbNE2zB3x=W2Q@mail.gmail.com>
 <CAMT-Mpz4j7p_cdb-eq=H5hjp0MorwnAvVkyW_yK6V9_BpEZxeg@mail.gmail.com> <CAAaarBY0bqx5CHqo-FtH2MCePtmSNpFfr1yjvLxuyw4XrDajeQ@mail.gmail.com>
From: Jun Rao <jun@confluent.io>
Date: Tue, 3 Jul 2018 14:07:24 -0700
Message-ID: <CAFc58G8mzXbhHfJP2kkQzxRtYJuOQ6b2TfC2bRzDo+E5wOdprg@mail.gmail.com>
Subject: Re: [DISCUSS] KIP-291: Have separate queues for control requests and
 data requests
To: dev <dev@kafka.apache.org>
Content-Type: multipart/alternative; boundary="000000000000746c1a05701eb378"

--000000000000746c1a05701eb378
Content-Type: text/plain; charset="UTF-8"

Hi, Lucas, Dong,

If all disks on a broker are slow, one probably should just kill the
broker. In that case, this KIP may not help. If only one of the disks on a
broker is slow, one may want to fail that disk and move the leaders on that
disk to other brokers. In that case, being able to process the LeaderAndIsr
requests faster will potentially help the producers recover quicker.

Thanks,

Jun

On Mon, Jul 2, 2018 at 7:56 PM, Dong Lin <lindong28@gmail.com> wrote:

> Hey Lucas,
>
> Thanks for the reply. Some follow up questions below.
>
> Regarding 1, if each ProduceRequest covers 20 partitions that are randomly
> distributed across all partitions, then each ProduceRequest will likely
> cover some partitions for which the broker is still leader after it quickly
> processes the
> LeaderAndIsrRequest. Then broker will still be slow in processing these
> ProduceRequest and request will still be very high with this KIP. It seems
> that most ProduceRequest will still timeout after 30 seconds. Is this
> understanding correct?
>
> Regarding 2, if most ProduceRequest will still timeout after 30 seconds,
> then it is less clear how this KIP reduces average produce latency. Can you
> clarify what metrics can be improved by this KIP?
>
> Not sure why system operator directly cares number of truncated messages.
> Do you mean this KIP can improve average throughput or reduce message
> duplication? It will be good to understand this.
>
> Thanks,
> Dong
>
>
>
>
>
> On Tue, 3 Jul 2018 at 7:12 AM Lucas Wang <lucasatucla@gmail.com> wrote:
>
> > Hi Dong,
> >
> > Thanks for your valuable comments. Please see my reply below.
> >
> > 1. The Google doc showed only 1 partition. Now let's consider a more
> common
> > scenario
> > where broker0 is the leader of many partitions. And let's say for some
> > reason its IO becomes slow.
> > The number of leader partitions on broker0 is so large, say 10K, that the
> > cluster is skewed,
> > and the operator would like to shift the leadership for a lot of
> > partitions, say 9K, to other brokers,
> > either manually or through some service like cruise control.
> > With this KIP, not only will the leadership transitions finish more
> > quickly, helping the cluster itself becoming more balanced,
> > but all existing producers corresponding to the 9K partitions will get
> the
> > errors relatively quickly
> > rather than relying on their timeout, thanks to the batched async ZK
> > operations.
> > To me it's a useful feature to have during such troublesome times.
> >
> >
> > 2. The experiments in the Google Doc have shown that with this KIP many
> > producers
> > receive an explicit error NotLeaderForPartition, based on which they
> retry
> > immediately.
> > Therefore the latency (~14 seconds+quick retry) for their single message
> is
> > much smaller
> > compared with the case of timing out without the KIP (30 seconds for
> timing
> > out + quick retry).
> > One might argue that reducing the timing out on the producer side can
> > achieve the same result,
> > yet reducing the timeout has its own drawbacks[1].
> >
> > Also *IF* there were a metric to show the number of truncated messages on
> > brokers,
> > with the experiments done in the Google Doc, it should be easy to see
> that
> > a lot fewer messages need
> > to be truncated on broker0 since the up-to-date metadata avoids appending
> > of messages
> > in subsequent PRODUCE requests. If we talk to a system operator and ask
> > whether
> > they prefer fewer wasteful IOs, I bet most likely the answer is yes.
> >
> > 3. To answer your question, I think it might be helpful to construct some
> > formulas.
> > To simplify the modeling, I'm going back to the case where there is only
> > ONE partition involved.
> > Following the experiments in the Google Doc, let's say broker0 becomes
> the
> > follower at time t0,
> > and after t0 there were still N produce requests in its request queue.
> > With the up-to-date metadata brought by this KIP, broker0 can reply with
> an
> > NotLeaderForPartition exception,
> > let's use M1 to denote the average processing time of replying with such
> an
> > error message.
> > Without this KIP, the broker will need to append messages to segments,
> > which may trigger a flush to disk,
> > let's use M2 to denote the average processing time for such logic.
> > Then the average extra latency incurred without this KIP is N * (M2 -
> M1) /
> > 2.
> >
> > In practice, M2 should always be larger than M1, which means as long as N
> > is positive,
> > we would see improvements on the average latency.
> > There does not need to be significant backlog of requests in the request
> > queue,
> > or severe degradation of disk performance to have the improvement.
> >
> > Regards,
> > Lucas
> >
> >
> > [1] For instance, reducing the timeout on the producer side can trigger
> > unnecessary duplicate requests
> > when the corresponding leader broker is overloaded, exacerbating the
> > situation.
> >
> > On Sun, Jul 1, 2018 at 9:18 PM, Dong Lin <lindong28@gmail.com> wrote:
> >
> > > Hey Lucas,
> > >
> > > Thanks much for the detailed documentation of the experiment.
> > >
> > > Initially I also think having a separate queue for controller requests
> is
> > > useful because, as you mentioned in the summary section of the Google
> > doc,
> > > controller requests are generally more important than data requests and
> > we
> > > probably want controller requests to be processed sooner. But then Eno
> > has
> > > two very good questions which I am not sure the Google doc has answered
> > > explicitly. Could you help with the following questions?
> > >
> > > 1) It is not very clear what is the actual benefit of KIP-291 to users.
> > The
> > > experiment setup in the Google doc simulates the scenario that broker
> is
> > > very slow handling ProduceRequest due to e.g. slow disk. It currently
> > > assumes that there is only 1 partition. But in the common scenario, it
> is
> > > probably reasonable to assume that there are many other partitions that
> > are
> > > also actively produced to and ProduceRequest to these partition also
> > takes
> > > e.g. 2 seconds to be processed. So even if broker0 can become follower
> > for
> > > the partition 0 soon, it probably still needs to process the
> > ProduceRequest
> > > slowly t in the queue because these ProduceRequests cover other
> > partitions.
> > > Thus most ProduceRequest will still timeout after 30 seconds and most
> > > clients will still likely timeout after 30 seconds. Then it is not
> > > obviously what is the benefit to client since client will timeout after
> > 30
> > > seconds before possibly re-connecting to broker1, with or without
> > KIP-291.
> > > Did I miss something here?
> > >
> > > 2) I guess Eno's is asking for the specific benefits of this KIP to
> user
> > or
> > > system administrator, e.g. whether this KIP decreases average latency,
> > > 999th percentile latency, probably of exception exposed to client etc.
> It
> > > is probably useful to clarify this.
> > >
> > > 3) Does this KIP help improve user experience only when there is issue
> > with
> > > broker, e.g. significant backlog in the request queue due to slow disk
> as
> > > described in the Google doc? Or is this KIP also useful when there is
> no
> > > ongoing issue in the cluster? It might be helpful to clarify this to
> > > understand the benefit of this KIP.
> > >
> > >
> > > Thanks much,
> > > Dong
> > >
> > >
> > >
> > >
> > > On Fri, Jun 29, 2018 at 2:58 PM, Lucas Wang <lucasatucla@gmail.com>
> > wrote:
> > >
> > > > Hi Eno,
> > > >
> > > > Sorry for the delay in getting the experiment results.
> > > > Here is a link to the positive impact achieved by implementing the
> > > proposed
> > > > change:
> > > > https://docs.google.com/document/d/1ge2jjp5aPTBber6zaIT9AdhW
> > > > FWUENJ3JO6Zyu4f9tgQ/edit?usp=sharing
> > > > Please take a look when you have time and let me know your feedback.
> > > >
> > > > Regards,
> > > > Lucas
> > > >
> > > > On Tue, Jun 26, 2018 at 9:52 AM, Harsha <kafka@harsha.io> wrote:
> > > >
> > > > > Thanks for the pointer. Will take a look might suit our
> requirements
> > > > > better.
> > > > >
> > > > > Thanks,
> > > > > Harsha
> > > > >
> > > > > On Mon, Jun 25th, 2018 at 2:52 PM, Lucas Wang <
> lucasatucla@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Hi Harsha,
> > > > > >
> > > > > > If I understand correctly, the replication quota mechanism
> proposed
> > > in
> > > > > > KIP-73 can be helpful in that scenario.
> > > > > > Have you tried it out?
> > > > > >
> > > > > > Thanks,
> > > > > > Lucas
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sun, Jun 24, 2018 at 8:28 AM, Harsha < kafka@harsha.io >
> wrote:
> > > > > >
> > > > > > > Hi Lucas,
> > > > > > > One more question, any thoughts on making this configurable
> > > > > > > and also allowing subset of data requests to be prioritized.
> For
> > > > > example
> > > > > >
> > > > > > > ,we notice in our cluster when we take out a broker and bring
> new
> > > one
> > > > > it
> > > > > >
> > > > > > > will try to become follower and have lot of fetch requests to
> > other
> > > > > > leaders
> > > > > > > in clusters. This will negatively effect the application/client
> > > > > > requests.
> > > > > > > We are also exploring the similar solution to de-prioritize if
> a
> > > new
> > > > > > > replica comes in for fetch requests, we are ok with the replica
> > to
> > > be
> > > > > > > taking time but the leaders should prioritize the client
> > requests.
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Harsha
> > > > > > >
> > > > > > > On Fri, Jun 22nd, 2018 at 11:35 AM Lucas Wang wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Hi Eno,
> > > > > > > >
> > > > > > > > Sorry for the delayed response.
> > > > > > > > - I haven't implemented the feature yet, so no experimental
> > > results
> > > > > so
> > > > > >
> > > > > > > > far.
> > > > > > > > And I plan to test in out in the following days.
> > > > > > > >
> > > > > > > > - You are absolutely right that the priority queue does not
> > > > > completely
> > > > > >
> > > > > > > > prevent
> > > > > > > > data requests being processed ahead of controller requests.
> > > > > > > > That being said, I expect it to greatly mitigate the effect
> of
> > > > stable
> > > > > > > > metadata.
> > > > > > > > In any case, I'll try it out and post the results when I have
> > it.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Lucas
> > > > > > > >
> > > > > > > > On Wed, Jun 20, 2018 at 5:44 AM, Eno Thereska <
> > > > > eno.thereska@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Lucas,
> > > > > > > > >
> > > > > > > > > Sorry for the delay, just had a look at this. A couple of
> > > > > questions:
> > > > > >
> > > > > > > > > - did you notice any positive change after implementing
> this
> > > KIP?
> > > > > > I'm
> > > > > > > > > wondering if you have any experimental results that show
> the
> > > > > benefit
> > > > > > of
> > > > > > > > the
> > > > > > > > > two queues.
> > > > > > > > >
> > > > > > > > > - priority is usually not sufficient in addressing the
> > problem
> > > > the
> > > > > > KIP
> > > > > > > > > identifies. Even with priority queues, you will sometimes
> > > > (often?)
> > > > > > have
> > > > > > > > the
> > > > > > > > > case that data plane requests will be ahead of the control
> > > plane
> > > > > > > > requests.
> > > > > > > > > This happens because the system might have already started
> > > > > > processing
> > > > > > > > the
> > > > > > > > > data plane requests before the control plane ones arrived.
> So
> > > it
> > > > > > would
> > > > > > > > be
> > > > > > > > > good to know what % of the problem this KIP addresses.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Eno
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > On Fri, Jun 15, 2018 at 4:44 PM, Ted Yu <
> > yuzhihong@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Change looks good.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 15, 2018 at 8:42 AM, Lucas Wang <
> > > > > lucasatucla@gmail.com
> > > > > >
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Ted,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the suggestion. I've updated the KIP. Please
> > > take
> > > > > > > another
> > > > > > > >
> > > > > > > > > > look.
> > > > > > > > > > >
> > > > > > > > > > > Lucas
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jun 14, 2018 at 6:34 PM, Ted Yu <
> > > > yuzhihong@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Currently in KafkaConfig.scala :
> > > > > > > > > > > >
> > > > > > > > > > > > val QueuedMaxRequests = 500
> > > > > > > > > > > >
> > > > > > > > > > > > It would be good if you can include the default value
> > for
> > > > > this
> > > > > >
> > > > > > > new
> > > > > > > >
> > > > > > > > > > config
> > > > > > > > > > > > in the KIP.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jun 14, 2018 at 4:28 PM, Lucas Wang <
> > > > > > > lucasatucla@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Ted, Dong
> > > > > > > > > > > > >
> > > > > > > > > > > > > I've updated the KIP by adding a new config,
> instead
> > of
> > > > > > reusing
> > > > > > > > the
> > > > > > > > > > > > > existing one.
> > > > > > > > > > > > > Please take another look when you have time.
> Thanks a
> > > > lot!
> > > > > > > > > > > > >
> > > > > > > > > > > > > Lucas
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Jun 14, 2018 at 2:33 PM, Ted Yu <
> > > > > yuzhihong@gmail.com
> > > > > >
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > bq. that's a waste of resource if control request
> > > rate
> > > > is
> > > > > > low
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I don't know if control request rate can get to
> > > > 100,000,
> > > > > > > > likely
> > > > > > > > > > not.
> > > > > > > > > > > > Then
> > > > > > > > > > > > > > using the same bound as that for data requests
> > seems
> > > > > high.
> > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 10:13 PM, Lucas Wang <
> > > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Ted,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for taking a look at this KIP.
> > > > > > > > > > > > > > > Let's say today the setting of
> > > "queued.max.requests"
> > > > in
> > > > > > > > > cluster A
> > > > > > > > > > > is
> > > > > > > > > > > > > > 1000,
> > > > > > > > > > > > > > > while the setting in cluster B is 100,000.
> > > > > > > > > > > > > > > The 100 times difference might have indicated
> > that
> > > > > > machines
> > > > > > > > in
> > > > > > > > > > > > cluster
> > > > > > > > > > > > > B
> > > > > > > > > > > > > > > have larger memory.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > By reusing the "queued.max.requests", the
> > > > > > > > controlRequestQueue
> > > > > > > > > in
> > > > > > > > > > > > > cluster
> > > > > > > > > > > > > > B
> > > > > > > > > > > > > > > automatically
> > > > > > > > > > > > > > > gets a 100x capacity without explicitly
> bothering
> > > the
> > > > > > > > > operators.
> > > > > > > > > > > > > > > I understand the counter argument can be that
> > maybe
> > > > > > that's
> > > > > > > a
> > > > > > > >
> > > > > > > > > > waste
> > > > > > > > > > > of
> > > > > > > > > > > > > > > resource if control request
> > > > > > > > > > > > > > > rate is low and operators may want to fine tune
> > the
> > > > > > > capacity
> > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > > > > > > controlRequestQueue.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'm ok with either approach, and can change it
> if
> > > you
> > > > > or
> > > > > >
> > > > > > > > anyone
> > > > > > > > > > > else
> > > > > > > > > > > > > > feels
> > > > > > > > > > > > > > > strong about adding the extra config.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 3:11 PM, Ted Yu <
> > > > > > > yuzhihong@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Lucas:
> > > > > > > > > > > > > > > > Under Rejected Alternatives, #2, can you
> > > elaborate
> > > > a
> > > > > > bit
> > > > > > > > more
> > > > > > > > > > on
> > > > > > > > > > > > why
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > separate config has bigger impact ?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 2:00 PM, Dong Lin <
> > > > > > > > > lindong28@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hey Luca,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the KIP. Looks good overall.
> Some
> > > > > > comments
> > > > > > > > > below:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - We usually specify the full mbean for the
> > new
> > > > > > metrics
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > KIP.
> > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > specify it in the Public Interface section
> > > > similar
> > > > > > to
> > > > > > > > > KIP-237
> > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > >
> > > > > > > > > > > > > > > > > 237%3A+More+Controller+Health+Metrics>
> > > > > > > > > > > > > > > > > ?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - Maybe we could follow the same pattern as
> > > > KIP-153
> > > > > > > > > > > > > > > > > < https://cwiki.apache.org/
> > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > 153%3A+Include+only+client+traffic+in+BytesOutPerSec+
> > > > > > > > > > metric>,
> > > > > > > > > > > > > > > > > where we keep the existing sensor name
> > > > > > "BytesInPerSec"
> > > > > > > > and
> > > > > > > > > > add
> > > > > > > > > > > a
> > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > sensor
> > > > > > > > > > > > > > > > > "ReplicationBytesInPerSec", rather than
> > > replacing
> > > > > > the
> > > > > > > > > sensor
> > > > > > > > > > > > name "
> > > > > > > > > > > > > > > > > BytesInPerSec" with e.g.
> > "ClientBytesInPerSec".
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > - It seems that the KIP changes the
> semantics
> > > of
> > > > > the
> > > > > >
> > > > > > > > broker
> > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > "queued.max.requests" because the number of
> > > total
> > > > > > > > requests
> > > > > > > > > > > queued
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > broker will be no longer bounded by
> > > > > > > > "queued.max.requests".
> > > > > > > > > > This
> > > > > > > > > > > > > > > probably
> > > > > > > > > > > > > > > > > needs to be specified in the Public
> > Interfaces
> > > > > > section
> > > > > > > > for
> > > > > > > > > > > > > > discussion.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > Dong
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Jun 13, 2018 at 12:45 PM, Lucas
> Wang
> > <
> > > > > > > > > > > > > lucasatucla@gmail.com >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi Kafka experts,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I created KIP-291 to add a separate queue
> > for
> > > > > > > > controller
> > > > > > > > > > > > > requests:
> > > > > > > > > > > > > > > > > > https://cwiki.apache.org/
> > > > > > > confluence/display/KAFKA/KIP-
> > > > > > > >
> > > > > > > > > 291%
> > > > > > > > > > > > > > > > > > 3A+Have+separate+queues+for+
> > > > > > > control+requests+and+data+
> > > > > > > >
> > > > > > > > > > > requests
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Can you please take a look and let me
> know
> > > your
> > > > > > > > feedback?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks a lot for your time!
> > > > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > > > > Lucas
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

--000000000000746c1a05701eb378--