From dev-return-90662-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Thu Jan 4 03:43:27 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id DA24D18077A for ; Thu, 4 Jan 2018 03:43:27 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id C9D05160C39; Thu, 4 Jan 2018 02:43:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E7044160C1B for ; Thu, 4 Jan 2018 03:43:26 +0100 (CET) Received: (qmail 59507 invoked by uid 500); 4 Jan 2018 02:43:25 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 59494 invoked by uid 99); 4 Jan 2018 02:43:24 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Jan 2018 02:43:24 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 76699180799 for ; Thu, 4 Jan 2018 02:43:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.129 X-Spam-Level: ** X-Spam-Status: No, score=2.129 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id va7YMH6PSKMg for ; Thu, 4 Jan 2018 02:43:22 +0000 (UTC) Received: from mail-it0-f46.google.com (mail-it0-f46.google.com [209.85.214.46]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id DEE465F3B5 for ; Thu, 4 Jan 2018 02:43:21 +0000 (UTC) Received: by mail-it0-f46.google.com with SMTP id z6so727631iti.4 for ; Wed, 03 Jan 2018 18:43:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=LY8bGv1ygLHE41kqACLPeXBE62yNOtbJdcK6xuXQTJ8=; b=trfBYogAR+Ma6vyxRo1wztJMO2wpfih6cd/lmwW9No13saPOAZaAGjlM/wAn3/U2Ah OeMgn4rADAuiCDeMcr4fpm+dDj3xmVkma4/pSOyNY8dniKX/wM2CZihvxXsUWinoxdyT KSSQBeoFXCCAjF5kTDBG/upJIWRrUkRNflyplmHvTtF3VxW6wIymt9h5Ags3Ji9SLSDD 40r0gvYhQGcHxy8XjgRA5SzILfAXRSr98DQGp0oQWjTf+OQ5vdJ8aQJ531GVTcDXSc7O yMdCL525p7Z3TtppVs4oDKsPe+mRKChKsf35kRZE4PXoMryd/nFCYTZdirm2mnx6UCaz 6tQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=LY8bGv1ygLHE41kqACLPeXBE62yNOtbJdcK6xuXQTJ8=; b=X0KDW36TILqq8lhlcCtqcnK0cAt8GmYoW/NGNewOcEdTUWOvjEeggIR7+QJ6ZcsiH7 4sjsx75u9CmbOWLbsGTGPXwigkBNRv2up/R1CxiOb6zYVPDoGo5g+uSjpSXEnA1/KuPN WnAK+mxQFi2djhIUKQiEzASc5AAgovY5vIlN+zkbqX2ueSmTzTEtguH4thlDofTsqI6X RoT+Q7WdBkWfBpvcOjhXZo1ljKiXsL9HqmTqn7tY3YtF/9kE+IO7ZL657sWlHE524pzn lp/6U+SBh7evu9DA5bPp+mihmurLesOME43XLILwGC9ofcnbQ3UgnjYWJMLabU3Q8evz 04LQ== X-Gm-Message-State: AKGB3mIWwSZWutI+Nkln63cwo2n49PK8rSO0866XxhzzI38ORRZN5+WK veBcjuvvxNSvYzS/v+7asT81OAlgV6g6GBM3gaA= X-Google-Smtp-Source: ACJfBouEl7+xRe96wvUr2UeDYGRlodSXa9we/yqvSKNqZbKCoSsvadrg7fKFnmEYu3Q4frkBtl97GmhyMZfKsU4nw6Y= X-Received: by 10.36.105.65 with SMTP id e62mr4537531itc.6.1515033801031; Wed, 03 Jan 2018 18:43:21 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.141.5 with HTTP; Wed, 3 Jan 2018 18:43:00 -0800 (PST) In-Reply-To: References: From: Dong Lin Date: Wed, 3 Jan 2018 18:43:00 -0800 Message-ID: Subject: Re: [DISCUSS] KIP-232: Detect outdated metadata by adding ControllerMetadataEpoch field To: dev@kafka.apache.org Content-Type: multipart/alternative; boundary="001a1144528488876f0561ea4b27" --001a1144528488876f0561ea4b27 Content-Type: text/plain; charset="UTF-8" Hey Jun, Thanks much for the explanation. I understand the advantage of partition_epoch over metadata_epoch. My current concern is that the use of leader_epoch and the partition_epoch requires us considerable change to consumer's public API to take care of the case where user stores offset externally. For example, *consumer*. *commitSync*(..) would have to take a map whose value is . *consumer*.*seek*(...) would also need leader_epoch and partition_epoch as parameter. Technically we can probably still make it work in a backward compatible manner after careful design and discussion. But these changes can make the consumer's interface unnecessarily complex for more users who do not store offset externally. After thinking more about it, we can address all problems discussed by only using the metadata_epoch without introducing leader_epoch or the partition_epoch. The current KIP describes the changes to the consumer API and how the new API can be used if user stores offset externally. In order to address the scenario you described earlier, we can include metadata_epoch in the FetchResponse and the LeaderAndIsrRequest. Consumer remembers the largest metadata_epoch from all the FetchResponse it has received. The metadata_epoch committed with the offset, either within or outside Kafka, should be the largest metadata_epoch across all FetchResponse and MetadataResponse ever received by this consumer. The drawback of using only the metadata_epoch is that we can not always do the smart offset reset in case of unclean leader election which you mentioned earlier. But in most case, unclean leader election probably happens when consumer is not rebalancing/restarting. In these cases, either consumer is not directly affected by unclean leader election since it is not consuming from the end of the log, or consumer can derive the leader_epoch from the most recent message received before it sees OffsetOutOfRangeException. So I am not sure it is worth adding the leader_epoch to consumer API to address the remaining corner case. What do you think? Thanks, Dong On Tue, Jan 2, 2018 at 6:28 PM, Jun Rao wrote: > Hi, Dong, > > Thanks for the reply. > > To solve the topic recreation issue, we could use either a global metadata > version or a partition level epoch. But either one will be a new concept, > right? To me, the latter seems more natural. It also makes it easier to > detect if a consumer's offset is still valid after a topic is recreated. As > you pointed out, we don't need to store the partition epoch in the message. > The following is what I am thinking. When a partition is created, we can > assign a partition epoch from an ever-increasing global counter and store > it in /brokers/topics/[topic]/partitions/[partitionId] in ZK. The > partition > epoch is propagated to every broker. The consumer will be tracking a tuple > of for offsets. If a topic is > recreated, it's possible that a consumer's offset and leader epoch still > match that in the broker, but partition epoch won't be. In this case, we > can potentially still treat the consumer's offset as out of range and reset > the offset based on the offset reset policy in the consumer. This seems > harder to do with a global metadata version. > > Jun > > > > On Mon, Dec 25, 2017 at 6:56 AM, Dong Lin wrote: > > > Hey Jun, > > > > This is a very good example. After thinking through this in detail, I > agree > > that we need to commit offset with leader epoch in order to address this > > example. > > > > I think the remaining question is how to address the scenario that the > > topic is deleted and re-created. One possible solution is to commit > offset > > with both the leader epoch and the metadata version. The logic and the > > implementation of this solution does not require a new concept (e.g. > > partition epoch) and it does not require any change to the message format > > or leader epoch. It also allows us to order the metadata in a > > straightforward manner which may be useful in the future. So it may be a > > better solution than generating a random partition epoch every time we > > create a partition. Does this sound reasonable? > > > > Previously one concern with using the metadata version is that consumer > > will be forced to refresh metadata even if metadata version is increased > > due to topics that the consumer is not interested in. Now I realized that > > this is probably not a problem. Currently client will refresh metadata > > either due to InvalidMetadataException in the response from broker or due > > to metadata expiry. The addition of the metadata version should increase > > the overhead of metadata refresh caused by InvalidMetadataException. If > > client refresh metadata due to expiry and it receives a metadata whose > > version is lower than the current metadata version, we can reject the > > metadata but still reset the metadata age, which essentially keep the > > existing behavior in the client. > > > > Thanks much, > > Dong > > > --001a1144528488876f0561ea4b27--