From dev-return-119802-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Wed Dec 2 22:50:38 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-he-de.apache.org (mxout1-he-de.apache.org [95.216.194.37]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 695C9180643 for ; Wed, 2 Dec 2020 23:50:38 +0100 (CET) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with SMTP id 4A90E67695 for ; Wed, 2 Dec 2020 22:50:36 +0000 (UTC) Received: (qmail 97496 invoked by uid 500); 2 Dec 2020 22:50:33 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 97482 invoked by uid 99); 2 Dec 2020 22:50:33 -0000 Received: from ec2-52-204-25-47.compute-1.amazonaws.com (HELO mailrelay1-ec2-va.apache.org) (52.204.25.47) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Dec 2020 22:50:33 +0000 Received: from auth2-smtp.messagingengine.com (auth2-smtp.messagingengine.com [66.111.4.228]) by mailrelay1-ec2-va.apache.org (ASF Mail Server at mailrelay1-ec2-va.apache.org) with ESMTPSA id 92C823E868 for ; Wed, 2 Dec 2020 22:50:33 +0000 (UTC) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailauth.nyi.internal (Postfix) with ESMTP id 6241427C0054 for ; Wed, 2 Dec 2020 17:50:33 -0500 (EST) Received: from imap1 ([10.202.2.51]) by compute6.internal (MEProxy); Wed, 02 Dec 2020 17:50:33 -0500 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrudeihedgtddvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfgjfhffhffvufgtsehttd ertderredtnecuhfhrohhmpedfveholhhinhcuofgtvegrsggvfdcuoegtmhgttggrsggv segrphgrtghhvgdrohhrgheqnecuggftrfgrthhtvghrnhepgeelveegvefhhfeghfekte dvjefgteekieekueegudehtdefuefhleffvedtteegnecuffhomhgrihhnpehlihhsthgv nhgvrhdrnhgrmhgvpdgrphgrtghhvgdrohhrghenucevlhhushhtvghrufhiiigvpedtne curfgrrhgrmhepmhgrihhlfhhrohhmpegtmhgttggrsggvfedugedomhgvshhmthhprghu thhhphgvrhhsohhnrghlihhthidqgeeiudekgedufedtqdduheehkeekhedugedqtghmtg gtrggsvgeppegrphgrtghhvgdrohhrghesfhgrshhtmhgrihhlrdgtohhm X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id 00580C200A5; Wed, 2 Dec 2020 17:50:32 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.3.0-622-g4a97c0b-fm-20201115.001-g4a97c0b3 Mime-Version: 1.0 Message-Id: <70c597a1-4c5b-4b3a-b267-68f5e758b3f3@www.fastmail.com> In-Reply-To: References: <6f3ffc64-ea41-43a2-b98c-dad44052ccf8@www.fastmail.com> <7c39476e-2edc-4d64-97df-47b1295d66ef@www.fastmail.com> <6d64a44f-5677-4b1c-83f3-cc5165d3a0a8@www.fastmail.com> <47dc1655-9154-4864-8c11-1f6cc20df03f@www.fastmail.com> <76ba8fe6-efe7-4362-b11b-b90cb28d756d@www.fastmail.com> <67a8a04c-ff26-483e-80cf-24dc4eb99b84@www.fastmail.com> <684bdee1-fb5e-4714-b98f-118c6c607891@www.fastmail.com> <484620a4-757f-4b92-adfa-79fa03d3a576@www.fastmail.com> <38c54218-c87e-4442-b857-7757423f8804@www.fastmail.com> <100ba12b-b981-40fb-8ca6-a47e3f623085@www.fastmail.com> <4c9839fb-3182-491b-8998-c2d9808b1aeb@www.fastmail.com> Date: Wed, 02 Dec 2020 14:48:47 -0800 From: "Colin McCabe" To: dev@kafka.apache.org Subject: Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller Content-Type: text/plain On Wed, Dec 2, 2020, at 14:07, Ron Dagostino wrote: > Hi Colin. Thanks for the updates. It's now clear to me that brokers > keep their broker epoch for the life of their JVM -- they register > once, get their broker epoch in the response, and then never > re-register again. Brokers may get fenced, but they keep the same > broker epoch for the life of their JVM. The incarnation ID is also > kept for the life of the JVM but is generated by the broker itself > upon startup, and the combination of the two allows the Controller to > act idempotently if any previously-sent registration response gets > lost. Makes sense. > Thanks, Ron. That's a good summary. > One thing I wonder about is if it might be helpful for the broker to > send the Cluster ID as determined from its meta.properties file in its > registration request. Does it even make sense for the broker to > successfully register and enter the Fenced state if it has the wrong > Cluster ID? Yeah, that's a good idea. Let's have the broker pass its cluster ID in the registration RPC, and then registration can fail if the broker is configured for the wrong cluster. > The nextMetadatOffset value that the broker communicates > in its registration request only has meaning within the correct > cluster, so it feels to me that the Controller should have some way to > perform this sanity check. There is currently (pre-KIP 500) a check > in the broker to make sure its configured cluster ID matches the one > stored in ZooKeeper, and we will have to perform this validation > somewhere in the KIP-500 world. If the Controller doesn't do it > within the registration request then the broker will have to make a > metadata request to the Controller, retrieve the Cluster ID, and > perform the check itself. It feels to me that it might be better for > the Controller to just do it, and then the broker doesn't have to > worry about it anymore once it successfully registers. > > I also have a question about the broker.id value and meta.properties. > The KIP now says "In version 0 of meta.properties, there is a > broker.id field. Version 1 does not have this field. It is no longer > needed because we no longer support dynamic broker id assignment." > But then there is an example version 1 meta.properties file that shows > the broker.id value. I actually wonder if maybe the broker.id value > would be good to keep in the version 1 meta.properties file because it > currently (pre-KIP 500, version 0) acts as a sanity check to make sure > the broker is using the correct log directory. Similarly with the > controller.id value on controllers -- it would allow the same type of > sanity check for quorum controllers. > That's a good point. I will add broker.id back, and also add controller.id as a possibility. cheers, Colin > > On Mon, Nov 30, 2020 at 7:41 PM Colin McCabe wrote: > > > > On Fri, Oct 23, 2020, at 16:10, Jun Rao wrote: > > > Hi, Colin, > > > > > > Thanks for the reply. A few more comments. > > > > Hi Jun, > > > > Thanks again for the reply. Sorry for the long hiatus. I was on vacation for a while. > > > > > > > > 55. There is still text that favors new broker registration. "When a broker > > > first starts up, when it is in the INITIAL state, it will always "win" > > > broker ID conflicts. However, once it is granted a lease, it transitions > > > out of the INITIAL state. Thereafter, it may lose subsequent conflicts if > > > its broker epoch is stale. (See KIP-380 for some background on broker > > > epoch.) The reason for favoring new processes is to accommodate the common > > > case where a process is killed with kill -9 and then restarted. We want it > > > to be able to reclaim its old ID quickly in this case." > > > > > > > Thanks for the reminder. I have clarified the language here. Hopefully now it is clear that we don't allow quick re-use of broker IDs. > > > > > 80.1 Sounds good. Could you document that listeners is a required config > > > now? It would also be useful to annotate other required configs. For > > > example, controller.connect should be required. > > > > > > > I added a note specifying that these are required. > > > > > 80.2 Could you list all deprecated existing configs? Another one is > > > control.plane.listener.name since the controller no longer sends > > > LeaderAndIsr, UpdateMetadata and StopReplica requests. > > > > > > > I added a section specifying some deprecated configs. > > > > > 83.1 It seems that the broker can transition from FENCED to RUNNING without > > > registering for a new broker epoch. I am not sure how this works. Once the > > > controller fences a broker, there is no need for the controller to keep the > > > boker epoch around. So, if the fenced broker's heartbeat request with the > > > existing broker epoch will be rejected, leading the broker back to the > > > FENCED state again. > > > > > > > The broker epoch refers to the broker registration. So we DO keep the broker epoch around even while the broker is fenced. > > > > The broker epoch changes only when there is a new broker registration. Fencing or unfencing the broker doesn't change the broker epoch. > > > > > 83.5 Good point on KIP-590. Then should we expose the controller for > > > debugging purposes? If not, we should deprecate the controllerID field in > > > MetadataResponse? > > > > > > > I think it's OK to expose it for now, with the proviso that it won't be reachable by clients. > > > > > 90. We rejected the shared ID with just one reason "This is not a good idea > > > because NetworkClient assumes a single ID space. So if there is both a > > > controller 1 and a broker 1, we don't have a way of picking the "right" > > > one." This doesn't seem to be a strong reason. For example, we could > > > address the NetworkClient issue with the node type as you pointed out or > > > using the negative value of a broker ID as the controller ID. > > > > > > > It would require a lot of code changes to support multiple types of node IDs. It's not clear to me that the end result would be better -- I tend to think it would be worse, since it would be more complex. In a similar vein, using negative numbers seems dangerous, since we use negatives or -1 as "special values" in many places. For example, -1 often represents "no such node." > > > > One important thing to keep in mind is that we want to be able to transition from a broker and a controller being co-located to them no longer being co-located. This is much easier to do when they have separate IDs. > > > > > 100. In KIP-589 > > > , > > > the broker reports all offline replicas due to a disk failure to the > > > controller. It seems this information needs to be persisted to the > > > metadata > > > log. Do we have a corresponding record for that? > > > > > > > Hmm, I have to look into this a little bit more. We may need a new record type. > > > > > 101. Currently, StopReplica request has 2 modes, without deletion and with > > > deletion. The former is used for controlled shutdown and handling disk > > > failure, and causes the follower to stop. The latter is for topic deletion > > > and partition reassignment, and causes the replica to be deleted. Since we > > > are deprecating StopReplica, could we document what triggers the stopping > > > of a follower and the deleting of a replica now? > > > > > > > RemoveTopic triggers deletion. In general the functionality of StopReplica is subsumed by the metadata records. > > > > > 102. Should we include the metadata topic in the MetadataResponse? If so, > > > when it will be included and what will the metadata response look like? > > > > > > > No, it won't be included in the metadata response sent back from the brokers. > > > > > 103. "The active controller assigns the broker a new broker epoch, based on > > > the latest committed offset in the log." This seems inaccurate since the > > > latest committed offset doesn't always advance on every log append. > > > > > > > Given that the new broker epoch won't be visible until the commit has happened, I have changed this to "the next available offset in the log" > > > > > 104. REGISTERING(1) : It says "Otherwise, the broker moves into the FENCED > > > state.". It seems this should be RUNNING? > > > > > > 105. RUNNING: Should we require the broker to catch up to the metadata log > > > to get into this state? > > > > For 104 and 105, these sections have been reworked. > > > > best, > > Colin > > > > > > > > Thanks, > > > > > > Jun > > > > > > > > > > > > On Fri, Oct 23, 2020 at 1:20 PM Colin McCabe wrote: > > > > > > > On Wed, Oct 21, 2020, at 05:51, Tom Bentley wrote: > > > > > Hi Colin, > > > > > > > > > > On Mon, Oct 19, 2020, at 08:59, Ron Dagostino wrote: > > > > > > > Hi Colin. Thanks for the hard work on this KIP. > > > > > > > > > > > > > > I have some questions about what happens to a broker when it becomes > > > > > > > fenced (e.g. because it can't send a heartbeat request to keep its > > > > > > > lease). The KIP says "When a broker is fenced, it cannot process any > > > > > > > client requests. This prevents brokers which are not receiving > > > > > > > metadata updates or that are not receiving and processing them fast > > > > > > > enough from causing issues to clients." And in the description of the > > > > > > > FENCED(4) state it likewise says "While in this state, the broker > > > > does > > > > > > > not respond to client requests." It makes sense that a fenced broker > > > > > > > should not accept producer requests -- I assume any such requests > > > > > > > would result in NotLeaderOrFollowerException. But what about KIP-392 > > > > > > > (fetch from follower) consumer requests? It is conceivable that > > > > these > > > > > > > could continue. Related to that, would a fenced broker continue to > > > > > > > fetch data for partitions where it thinks it is a follower? Even if > > > > > > > it rejects consumer requests it might still continue to fetch as a > > > > > > > follower. Might it be helpful to clarify both decisions here? > > > > > > > > > > > > Hi Ron, > > > > > > > > > > > > Good question. I think a fenced broker should continue to fetch on > > > > > > partitions it was already fetching before it was fenced, unless it > > > > hits a > > > > > > problem. At that point it won't be able to continue, since it doesn't > > > > have > > > > > > the new metadata. For example, it won't know about leadership changes > > > > in > > > > > > the partitions it's fetching. The rationale for continuing to fetch > > > > is to > > > > > > try to avoid disruptions as much as possible. > > > > > > > > > > > > I don't think fenced brokers should accept client requests. The issue > > > > is > > > > > > that the fenced broker may or may not have any data it is supposed to > > > > > > have. It may or may not have applied any configuration changes, etc. > > > > that > > > > > > it is supposed to have applied. So it could get pretty confusing, and > > > > also > > > > > > potentially waste the client's time. > > > > > > > > > > > > > > > > > When fenced, how would the broker reply to a client which did make a > > > > > request? > > > > > > > > > > > > > Hi Tom, > > > > > > > > The broker will respond with a retryable error in that case. Once the > > > > client has re-fetched its metadata, it will no longer see the fenced broker > > > > as part of the cluster. I added a note to the KIP. > > > > > > > > best, > > > > Colin > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Tom > > > > > > > > > > > > >