From dev-return-119802-archive-asf-public=cust-asf.ponee.io@kafka.apache.org  Wed Dec  2 22:50:38 2020
Return-Path: <dev-return-119802-archive-asf-public=cust-asf.ponee.io@kafka.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mxout1-he-de.apache.org (mxout1-he-de.apache.org [95.216.194.37])
	by mx-eu-01.ponee.io (Postfix) with ESMTPS id 695C9180643
	for <archive-asf-public@cust-asf.ponee.io>; Wed,  2 Dec 2020 23:50:38 +0100 (CET)
Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153])
	by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with SMTP id 4A90E67695
	for <archive-asf-public@cust-asf.ponee.io>; Wed,  2 Dec 2020 22:50:36 +0000 (UTC)
Received: (qmail 97496 invoked by uid 500); 2 Dec 2020 22:50:33 -0000
Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@kafka.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@kafka.apache.org>
List-Post: <mailto:dev@kafka.apache.org>
List-Id: <dev.kafka.apache.org>
Reply-To: dev@kafka.apache.org
Delivered-To: mailing list dev@kafka.apache.org
Received: (qmail 97482 invoked by uid 99); 2 Dec 2020 22:50:33 -0000
Received: from ec2-52-204-25-47.compute-1.amazonaws.com (HELO mailrelay1-ec2-va.apache.org) (52.204.25.47)
    by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Dec 2020 22:50:33 +0000
Received: from auth2-smtp.messagingengine.com (auth2-smtp.messagingengine.com [66.111.4.228])
	by mailrelay1-ec2-va.apache.org (ASF Mail Server at mailrelay1-ec2-va.apache.org) with ESMTPSA id 92C823E868
	for <dev@kafka.apache.org>; Wed,  2 Dec 2020 22:50:33 +0000 (UTC)
Received: from compute6.internal (compute6.nyi.internal [10.202.2.46])
	by mailauth.nyi.internal (Postfix) with ESMTP id 6241427C0054
	for <dev@kafka.apache.org>; Wed,  2 Dec 2020 17:50:33 -0500 (EST)
Received: from imap1 ([10.202.2.51])
  by compute6.internal (MEProxy); Wed, 02 Dec 2020 17:50:33 -0500
X-ME-Sender: <xms:ORrIX0xfhkEsMlXhPuanUGMNmX2Jk3RJ05gm7X3kOgzFyDdbLv_8ig>
    <xme:ORrIX4T32qm1J98fkpuRPTqX9CCoY6FgmFDIDrW6GS3CYUEpVQZSjpaYJiAbyQNqJ
    JMGw3kB7mN6K3XgEw>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrudeihedgtddvucetufdoteggodetrfdotf
    fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen
    uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfgjfhffhffvufgtsehttd
    ertderredtnecuhfhrohhmpedfveholhhinhcuofgtvegrsggvfdcuoegtmhgttggrsggv
    segrphgrtghhvgdrohhrgheqnecuggftrfgrthhtvghrnhepgeelveegvefhhfeghfekte
    dvjefgteekieekueegudehtdefuefhleffvedtteegnecuffhomhgrihhnpehlihhsthgv
    nhgvrhdrnhgrmhgvpdgrphgrtghhvgdrohhrghenucevlhhushhtvghrufhiiigvpedtne
    curfgrrhgrmhepmhgrihhlfhhrohhmpegtmhgttggrsggvfedugedomhgvshhmthhprghu
    thhhphgvrhhsohhnrghlihhthidqgeeiudekgedufedtqdduheehkeekhedugedqtghmtg
    gtrggsvgeppegrphgrtghhvgdrohhrghesfhgrshhtmhgrihhlrdgtohhm
X-ME-Proxy: <xmx:ORrIX2VTrKmfFxO119tWUYi9EHMqLYBO01XoRh_n6wvZITZCYduzmQ>
    <xmx:ORrIXyisS819NO-B_6L-qtbghCVYSriKoR3hOCZt6ViZk_hy9fU0JQ>
    <xmx:ORrIX2ABCAfKOcEtVr5xDj2yUxFblIZy6H7-LscXIRvOkYuWw_f8fA>
    <xmx:ORrIX5Pnzm3XGk-17hcJZdbdy5oWhjJhzSj1Q_e2CmTmwbAG4PIo9g>
Received: by mailuser.nyi.internal (Postfix, from userid 501)
	id 00580C200A5; Wed,  2 Dec 2020 17:50:32 -0500 (EST)
X-Mailer: MessagingEngine.com Webmail Interface
User-Agent: Cyrus-JMAP/3.3.0-622-g4a97c0b-fm-20201115.001-g4a97c0b3
Mime-Version: 1.0
Message-Id: <70c597a1-4c5b-4b3a-b267-68f5e758b3f3@www.fastmail.com>
In-Reply-To: 
 <CAMLDoAM2p22E1DZf30HXOE8Kx8j4c8FGfdLhPWgtM6PO9HuJJA@mail.gmail.com>
References: <ceb431d2-1ac3-42df-9325-7e01d8430b73@www.fastmail.com>
 <CAMLDoAMGpaWJGPwka95zXCfD6de9Rz0+M8Xkpb7_sXjToHzJag@mail.gmail.com>
 <CAOk+zffkRzwDO-=E4Br_ontvtSyw2DG0HuQucDMoN3Cixo3X9g@mail.gmail.com>
 <6f3ffc64-ea41-43a2-b98c-dad44052ccf8@www.fastmail.com>
 <CAOk+zfeM-Esx91+xOYPg+h+YSy-9Ynb8tOGyoQ2TByXKk6doXw@mail.gmail.com>
 <7c39476e-2edc-4d64-97df-47b1295d66ef@www.fastmail.com>
 <CAOk+zfd2P8EkH0YvPde0PvbyFu1ivdo+QxL=sSUDR5efNN5o6g@mail.gmail.com>
 <CAFAZAhG5t2qzFXrWa-dzdFSvjxH2T8UhxiOH17DyJ5JNbUVDBA@mail.gmail.com>
 <c3806d48-a58f-4342-ab97-3f2b40fa421b@www.fastmail.com>
 <CAOk+zfda80WcjQQM+G1xvjN5UonG0+uOUYhnNM5yC8-M0Qdmsg@mail.gmail.com>
 <CAOk+zfdbNRpPvWaq0izSce73f1gt0Aq5=Fk2L3uDBzD8ZW8O1A@mail.gmail.com>
 <CAFAZAhF+-=dmYLE=agCcDWcPaD3uPamDZ53Cd5-sjnw+UWk1zw@mail.gmail.com>
 <b9720b40-b9cb-433f-8565-86baa9a27f3c@www.fastmail.com>
 <CAOk+zfebp6O1Dt+1h33a+YyqgsFJSo_J+6ZSXOaFi0oHP3OXRQ@mail.gmail.com>
 <a3e8e97b-eef7-4f4e-831a-884503775d71@www.fastmail.com>
 <CAOk+zfdZ9sjRe7UHy1QG4NFxUO+tqxB6B9mqM6xB3R3ctJ36TA@mail.gmail.com>
 <6d64a44f-5677-4b1c-83f3-cc5165d3a0a8@www.fastmail.com>
 <CAOk+zfeJqn5Uvatvk+0awRguhaeoYmRHt9gUPnN00UYA0=FcKw@mail.gmail.com>
 <a6b9d9bb-064c-4599-908e-921fb7ecbdef@www.fastmail.com>
 <CAOk+zfevM1rhT69f9i9-823R2X3txRQi0hDfdNX1oOeKTXj6bw@mail.gmail.com>
 <fed83dc7-f6f0-4783-a618-5d332b07eb4e@www.fastmail.com>
 <CAOk+zff6ZjS+mWr59-qJA_seRe-RjX_7y0n=cYmhvHwOb+i-Mg@mail.gmail.com>
 <47dc1655-9154-4864-8c11-1f6cc20df03f@www.fastmail.com>
 <CAOk+zffDMj5Zq720a6jyD-orMsmOtWJOU1MbrkSPjjK=NmwYKg@mail.gmail.com>
 <76ba8fe6-efe7-4362-b11b-b90cb28d756d@www.fastmail.com>
 <CAFc58G-Qb8JxkT=kyz9crwffKbFU9WLG3znO2q3DyX1UCmR7DA@mail.gmail.com>
 <f510d1f9-7534-4be7-9892-d46ecebb3132@www.fastmail.com>
 <CAFc58G9CJQZr1zGxXPrVvJhyJK4jr-O4nTqzJG5ZC5rCQhwrWw@mail.gmail.com>
 <67a8a04c-ff26-483e-80cf-24dc4eb99b84@www.fastmail.com>
 <CAFc58G9hjtjC0oe5NMok_qGurQSQrmO9wfLMKicre9uFLDb8ew@mail.gmail.com>
 <684bdee1-fb5e-4714-b98f-118c6c607891@www.fastmail.com>
 <CAFc58G_BogRURkOxAcEoutY4SBSMPbM5y+a3akskyS5MVmwvmQ@mail.gmail.com>
 <484620a4-757f-4b92-adfa-79fa03d3a576@www.fastmail.com>
 <CAFc58G8wSPVHuni_dVeUq4CfiSfot9RFmkFGNZxE+XDLsRWr7w@mail.gmail.com>
 <38c54218-c87e-4442-b857-7757423f8804@www.fastmail.com>
 <CAFc58G_DDfrtmrUmT2SvjMZLEKdAE47MkRav4hvaT8c6aYZp9A@mail.gmail.com>
 <100ba12b-b981-40fb-8ca6-a47e3f623085@www.fastmail.com>
 <CAFc58G9xH-GyeihvMm=BV2ji2A6xTByc_Ezp-05fwLaxWJnbgQ@mail.gmail.com>
 <CAMLDoAPBzSnaFapx+6wJdECQeuqAOBr7Qos=eoTS7Nmtsbihtw@mail.gmail.com>
 <4c9839fb-3182-491b-8998-c2d9808b1aeb@www.fastmail.com>
 <CADXUNmbjX9OdLioYtKjz2mpMdNQ_ckLd6EC2zfjmpbx6t6raXg@mail.gmail.com>
 <d7633632-acc5-48a6-a486-4629f1fab637@www.fastmail.com>
 <CAFc58G8Kuf1mYRgYwQ_soxhDOpF2Uujin=fHpN2E8bqMS8f9tg@mail.gmail.com>
 <db2ce8e6-b8de-46b8-8051-7a538890440e@www.fastmail.com>
 <CAMLDoAM2p22E1DZf30HXOE8Kx8j4c8FGfdLhPWgtM6PO9HuJJA@mail.gmail.com>
Date: Wed, 02 Dec 2020 14:48:47 -0800
From: "Colin McCabe" <cmccabe@apache.org>
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller
Content-Type: text/plain

On Wed, Dec 2, 2020, at 14:07, Ron Dagostino wrote:
> Hi Colin.  Thanks for the updates.  It's now clear to me that brokers
> keep their broker epoch for the life of their JVM -- they register
> once, get their broker epoch in the response, and then never
> re-register again.  Brokers may get fenced, but they keep the same
> broker epoch for the life of their JVM.  The incarnation ID is also
> kept for the life of the JVM but is generated by the broker itself
> upon startup, and the combination of the two allows the Controller to
> act idempotently if any previously-sent registration response gets
> lost.  Makes sense.
> 

Thanks, Ron.  That's a good summary.

> One thing I wonder about is if it might be helpful for the broker to
> send the Cluster ID as determined from its meta.properties file in its
> registration request.  Does it even make sense for the broker to
> successfully register and enter the Fenced state if it has the wrong
> Cluster ID?

Yeah, that's a good idea.  Let's have the broker pass its cluster ID in the registration RPC, and then registration can fail if the broker is configured for the wrong cluster.

>  The nextMetadatOffset value that the broker communicates
> in its registration request only has meaning within the correct
> cluster, so it feels to me that the Controller should have some way to
> perform this sanity check.  There is currently (pre-KIP 500) a check
> in the broker to make sure its configured cluster ID matches the one
> stored in ZooKeeper, and we will have to perform this validation
> somewhere in the KIP-500 world.  If the Controller doesn't do it
> within the registration request then the broker will have to make a
> metadata request to the Controller, retrieve the Cluster ID, and
> perform the check itself.  It feels to me that it might be better for
> the Controller to just do it, and then the broker doesn't have to
> worry about it anymore once it successfully registers.
> 
> I also have a question about the broker.id value and meta.properties.
> The KIP now says "In version 0 of meta.properties, there is a
> broker.id field.  Version 1 does not have this field.  It is no longer
> needed because we no longer support dynamic broker id assignment."
> But then there is an example version 1 meta.properties file that shows
> the broker.id value.  I actually wonder if maybe the broker.id value
> would be good to keep in the version 1 meta.properties file because it
> currently (pre-KIP 500, version 0) acts as a sanity check to make sure
> the broker is using the correct log directory.  Similarly with the
> controller.id value on controllers -- it would allow the same type of
> sanity check for quorum controllers.
> 

That's a good point.  I will add broker.id back, and also add controller.id as a possibility.

cheers,
Colin

> 
> On Mon, Nov 30, 2020 at 7:41 PM Colin McCabe <cmccabe@apache.org> wrote:
> >
> > On Fri, Oct 23, 2020, at 16:10, Jun Rao wrote:
> > > Hi, Colin,
> > >
> > > Thanks for the reply. A few more comments.
> >
> > Hi Jun,
> >
> > Thanks again for the reply.  Sorry for the long hiatus.  I was on vacation for a while.
> >
> > >
> > > 55. There is still text that favors new broker registration. "When a broker
> > > first starts up, when it is in the INITIAL state, it will always "win"
> > > broker ID conflicts.  However, once it is granted a lease, it transitions
> > > out of the INITIAL state.  Thereafter, it may lose subsequent conflicts if
> > > its broker epoch is stale.  (See KIP-380 for some background on broker
> > > epoch.)  The reason for favoring new processes is to accommodate the common
> > > case where a process is killed with kill -9 and then restarted.  We want it
> > > to be able to reclaim its old ID quickly in this case."
> > >
> >
> > Thanks for the reminder.  I have clarified the language here.  Hopefully now it is clear that we don't allow quick re-use of broker IDs.
> >
> > > 80.1 Sounds good. Could you document that listeners is a required config
> > > now? It would also be useful to annotate other required configs. For
> > > example, controller.connect should be required.
> > >
> >
> > I added a note specifying that these are required.
> >
> > > 80.2 Could you list all deprecated existing configs? Another one is
> > > control.plane.listener.name since the controller no longer sends
> > > LeaderAndIsr, UpdateMetadata and StopReplica requests.
> > >
> >
> > I added a section specifying some deprecated configs.
> >
> > > 83.1 It seems that the broker can transition from FENCED to RUNNING without
> > > registering for a new broker epoch. I am not sure how this works. Once the
> > > controller fences a broker, there is no need for the controller to keep the
> > > boker epoch around. So, if the fenced broker's heartbeat request with the
> > > existing broker epoch will be rejected, leading the broker back to the
> > > FENCED state again.
> > >
> >
> > The broker epoch refers to the broker registration.  So we DO keep the broker epoch around even while the broker is fenced.
> >
> > The broker epoch changes only when there is a new broker registration.  Fencing or unfencing the broker doesn't change the broker epoch.
> >
> > > 83.5 Good point on KIP-590. Then should we expose the controller for
> > > debugging purposes? If not, we should deprecate the controllerID field in
> > > MetadataResponse?
> > >
> >
> > I think it's OK to expose it for now, with the proviso that it won't be reachable by clients.
> >
> > > 90. We rejected the shared ID with just one reason "This is not a good idea
> > > because NetworkClient assumes a single ID space.  So if there is both a
> > > controller 1 and a broker 1, we don't have a way of picking the "right"
> > > one." This doesn't seem to be a strong reason. For example, we could
> > > address the NetworkClient issue with the node type as you pointed out or
> > > using the negative value of a broker ID as the controller ID.
> > >
> >
> > It would require a lot of code changes to support multiple types of node IDs.  It's not clear to me that the end result would be better -- I tend to think it would be worse, since it would be more complex.  In a similar vein, using negative numbers seems dangerous, since we use negatives or -1 as "special values" in many places.  For example, -1 often represents "no such node."
> >
> > One important thing to keep in mind is that we want to be able to transition from a broker and a controller being co-located to them no longer being co-located.  This is much easier to do when they have separate IDs.
> >
> > > 100. In KIP-589
> > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller>,
> > > the broker reports all offline replicas due to a disk failure to the
> > > controller. It seems this information needs to be persisted to the
> > > metadata
> > > log. Do we have a corresponding record for that?
> > >
> >
> > Hmm, I have to look into this a little bit more.  We may need a new record type.
> >
> > > 101. Currently, StopReplica request has 2 modes, without deletion and with
> > > deletion. The former is used for controlled shutdown and handling disk
> > > failure, and causes the follower to stop. The latter is for topic deletion
> > > and partition reassignment, and causes the replica to be deleted. Since we
> > > are deprecating StopReplica, could we document what triggers the stopping
> > > of a follower and the deleting of a replica now?
> > >
> >
> > RemoveTopic triggers deletion.  In general the functionality of StopReplica is subsumed by the metadata records.
> >
> > > 102. Should we include the metadata topic in the MetadataResponse? If so,
> > > when it will be included and what will the metadata response look like?
> > >
> >
> > No, it won't be included in the metadata response sent back from the brokers.
> >
> > > 103. "The active controller assigns the broker a new broker epoch, based on
> > > the latest committed offset in the log." This seems inaccurate since the
> > > latest committed offset doesn't always advance on every log append.
> > >
> >
> > Given that the new broker epoch won't be visible until the commit has happened, I have changed this to "the next available offset in the log"
> >
> > > 104. REGISTERING(1) : It says "Otherwise, the broker moves into the FENCED
> > > state.". It seems this should be RUNNING?
> > >
> > > 105. RUNNING: Should we require the broker to catch up to the metadata log
> > > to get into this state?
> >
> > For 104 and 105, these sections have been reworked.
> >
> > best,
> > Colin
> >
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > >
> > > On Fri, Oct 23, 2020 at 1:20 PM Colin McCabe <cmccabe@apache.org> wrote:
> > >
> > > > On Wed, Oct 21, 2020, at 05:51, Tom Bentley wrote:
> > > > > Hi Colin,
> > > > >
> > > > > On Mon, Oct 19, 2020, at 08:59, Ron Dagostino wrote:
> > > > > > > Hi Colin.  Thanks for the hard work on this KIP.
> > > > > > >
> > > > > > > I have some questions about what happens to a broker when it becomes
> > > > > > > fenced (e.g. because it can't send a heartbeat request to keep its
> > > > > > > lease).  The KIP says "When a broker is fenced, it cannot process any
> > > > > > > client requests.  This prevents brokers which are not receiving
> > > > > > > metadata updates or that are not receiving and processing them fast
> > > > > > > enough from causing issues to clients." And in the description of the
> > > > > > > FENCED(4) state it likewise says "While in this state, the broker
> > > > does
> > > > > > > not respond to client requests."  It makes sense that a fenced broker
> > > > > > > should not accept producer requests -- I assume any such requests
> > > > > > > would result in NotLeaderOrFollowerException.  But what about KIP-392
> > > > > > > (fetch from follower) consumer requests?  It is conceivable that
> > > > these
> > > > > > > could continue.  Related to that, would a fenced broker continue to
> > > > > > > fetch data for partitions where it thinks it is a follower?  Even if
> > > > > > > it rejects consumer requests it might still continue to fetch as a
> > > > > > > follower.  Might it be helpful to clarify both decisions here?
> > > > > >
> > > > > > Hi Ron,
> > > > > >
> > > > > > Good question.  I think a fenced broker should continue to fetch on
> > > > > > partitions it was already fetching before it was fenced, unless it
> > > > hits a
> > > > > > problem.  At that point it won't be able to continue, since it doesn't
> > > > have
> > > > > > the new metadata.  For example, it won't know about leadership changes
> > > > in
> > > > > > the partitions it's fetching.  The rationale for continuing to fetch
> > > > is to
> > > > > > try to avoid disruptions as much as possible.
> > > > > >
> > > > > > I don't think fenced brokers should accept client requests.  The issue
> > > > is
> > > > > > that the fenced broker may or may not have any data it is supposed to
> > > > > > have.  It may or may not have applied any configuration changes, etc.
> > > > that
> > > > > > it is supposed to have applied.  So it could get pretty confusing, and
> > > > also
> > > > > > potentially waste the client's time.
> > > > > >
> > > > > >
> > > > > When fenced, how would the broker reply to a client which did make a
> > > > > request?
> > > > >
> > > >
> > > > Hi Tom,
> > > >
> > > > The broker will respond with a retryable error in that case.  Once the
> > > > client has re-fetched its metadata, it will no longer see the fenced broker
> > > > as part of the cluster.  I added a note to the KIP.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Tom
> > > > >
> > > >
> > >
>