From dev-return-110527-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Wed Jan 15 22:22:25 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 6EBC218065E for ; Wed, 15 Jan 2020 23:22:25 +0100 (CET) Received: (qmail 9272 invoked by uid 500); 15 Jan 2020 22:22:24 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 9260 invoked by uid 99); 15 Jan 2020 22:22:24 -0000 Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.159) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jan 2020 22:22:24 +0000 Received: from auth1-smtp.messagingengine.com (auth1-smtp.messagingengine.com [66.111.4.227]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id E88DE1005 for ; Wed, 15 Jan 2020 22:22:23 +0000 (UTC) Received: from compute7.internal (compute7.nyi.internal [10.202.2.47]) by mailauth.nyi.internal (Postfix) with ESMTP id AD66422129 for ; Wed, 15 Jan 2020 17:22:23 -0500 (EST) Received: from imap34 ([10.202.2.84]) by compute7.internal (MEProxy); Wed, 15 Jan 2020 17:22:23 -0500 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedugedrtdefgdduheekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucgoufhushhpvggtthffohhmrghinhculdegledmne cujfgurhepofgfggfkjghffffhvffutgesthdtredtreerjeenucfhrhhomhepfdflohhh nhcutfhovghslhgvrhdfuceovhhvtggvphhhvghisegrphgrtghhvgdrohhrgheqnecuff homhgrihhnpehshhhorhhtuhhrlhdrrghtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehj rhhovghslhgvrhdomhgvshhmthhprghuthhhphgvrhhsohhnrghlihhthidquddtudehie ejjedtgedqvddvieehieejkeegqdhvvhgtvghphhgviheppegrphgrtghhvgdrohhrghes fhgrshhtmhgrihhlrdgtohhmnecuvehluhhsthgvrhfuihiivgeptd X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id 664141460061; Wed, 15 Jan 2020 17:22:23 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.1.7-754-g09d1619-fmstable-20200113v1 Mime-Version: 1.0 Message-Id: In-Reply-To: References: <2a609dec-427a-9701-65da-fc3e3737453c@confluent.io> <1d328411-1dec-1260-2807-c88ee330f0a4@confluent.io> <9f267a5b-cfec-f167-b215-f8d43ee43dea@confluent.io> <9ddce8d5-3369-a636-9eb6-4e843be2a95f@confluent.io> Date: Wed, 15 Jan 2020 16:22:08 -0600 From: "John Roesler" To: dev@kafka.apache.org Subject: =?UTF-8?Q?Re:_[DISCUSS]KIP-216:_IQ_should_throw_different_exceptions_for?= =?UTF-8?Q?_different_errors?= Content-Type: text/plain Hey Vito, Yes, thanks for the KIP. Sorry the discussion has been so long. Hopefully, we can close it out soon. I agree we can drop StreamsNotRunningException in favor of just StateStoreNotAvailableException. Unfortunately, I have some higher-level concerns. The value of these exceptions is that they tell you how to handle the various situations that can arise while querying a distributed data store. Ideally, as a caller, I should be able to just catch "retriable" or "fatal" and handle them appropriately. Otherwise, there's no point in having categories, and we should just have all the exceptions extend InvalidStateStoreException. Presently, it's not possible to tell from just the "retriable"/"fatal" distinction what to do. You can tell from the descriptions of the various exceptions. E.g.: Retriable: * StreamsRebalancingException: the exact same call should just be retried until the rebalance is complete * StateStoreMigratedException: the store handle is now invalid, so you need to re-discover the instance and get a new handle on that instance. In other words, the query itself may be valid, but the particular method invocation on this particular instance has encountered a fatal exception. Fatal: * UnknownStateStoreException: this is truly fatal. No amount of retrying or re-discovering is going to get you a handle on a store that doesn't exist in the cluster. * StateStoreNotAvailableException: this is actually recoverable, since the store might exist in the cluster, but isn't available on this particular instance (which is shut down or whatever). Personally, I'm not a fan of code bureaucracy, so I'm 100% fine with omitting the categorization and just having 5 subclasses of InvalidStateStoreException. Each of them would tell you how to handle them, and it's not too many to really understand and handle each one. If you really want to have a middle tier, I'd recommend: * RetryableStateStoreException: the exact same call should be repeated. * RecoverableStateStoreException: the store handle should be discarded and the caller should re-discover the location of the store and repeat the query on the correct instance. * FatalStateStoreException: the query/request is totally invalid and will never succeed. However, attempting to categorize the proposed exceptions reveals even problems with this categorization: Retriable: * StreamsRebalancingException Recoverable: * StateStoreMigratedException * StreamsNotRunningException Fatal: * UnknownStateStoreException But StreamsNotStartedException is strange... It means that one code path got a handle on a specific KafkaStreams object instance and sent it a query before another code path invoked the start() method on the exact same object instance. It seems like the most likely scenario is that whoever wrote the program just forgot to call start() before querying, in which case, retrying isn't going to help, and a fatal exception is more appropriate. I.e., it sounds like a "first 15 minutes experience" problem, and making it fatal would be more helpful. Even in a production context, there's no reason not to sequence your application startup such that you don't accept queries until after Streams is started. Thus, I guess I'd categorize it under "fatal". Regardless of whether you make it fatal or retriable, you'd still have a whole category with only one exception in it, and the other two categories only have two exceptions. Plus, as you pointed out in the KIP, you can't get all exceptions in all cases anyway: * store() can only throw NotStarted, NotRunning, and Unknown * actual store queries can only throw Rebalancing, Migrated, and NotRunning Thus, in practice also, there are exactly three categories and also exactly three exception types. It doesn't seem like there's a great advantage to the categories here. To avoid the categorization problem and also to clarify what exceptions can actually be thrown in different circumstances, it seems like we should just: * get rid of the middle tier and make all the exceptions extend InvalidStateStoreException * drop StateStoreNotAvailableException in favor of StreamsNotRunningException * clearly document on all public methods which exceptions need to be handled How do you feel about this? Thanks, -John On Wed, Jan 15, 2020, at 15:13, Bill Bejeck wrote: > Thanks for KIP Vito. > > Overall the KIP LGTM, but I'd have to agree with others on merging the > `StreamsNotRunningException` and `StateStoreNotAvailableException` classes. > > Since in both cases, the thread state is in `PENDING_SHUTDOWN || > NOT_RUNNING || ERROR` I'm not even sure how we could distinguish when to > use the different > exceptions. Maybe a good middle ground would be to have a detailed > exception message. > > The KIP freeze is close, so I think if we can agree on this, we can wrap up > the voting soon. > > Thanks, > Bill > > On Tue, Jan 14, 2020 at 2:12 PM Matthias J. Sax > wrote: > > > Vito, > > > > It's still unclear to me what the advantage is, to have both > > `StreamsNotRunningException` and `StateStoreNotAvailableException`? > > > > For both cased, the state is `PENDING_SHUTDOWN / NOT_RUNNING / ERROR` > > and thus, for a user point of view, why does it matter if the store is > > closed on not? I don't understand why/how this information would be > > useful? Do you have a concrete example in mind how a user would react > > differently to both exceptions? > > > > > > @Vinoth: about `StreamsRebalancingException` -- to me, it seems best to > > actually do this on a per-query basis, ie, have an overload > > `KafkaStreams#store(...)` that takes a boolean flag that allow to > > _disable_ the exception and opt-in to query a active store during > > recovery. However, as KIP-535 actually introduces this change in > > behavior, I think KIP-216 should not cover this, but KIP-535 should be > > updated. I'll follow up on the other KIP thread to raise this point. > > > > > > -Matthias > > > > On 1/11/20 12:26 AM, Vito Jeng wrote: > > > Hi, Matthias & Vinoth, > > > > > > Thanks for the feedback. > > > > > >> What is still unclear to me is, what we gain by having both > > >> `StreamsNotRunningException` and `StateStoreNotAvailableException`. Both > > >> exception are thrown when KafkaStreams is in state PENDING_SHUTDOWN / > > >> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know if the > > >> state store is closed on not -- I can't query it anyway? Maybe I miss > > >> something thought? > > > > > > Yes, both `StreamsNotRunningException` and > > > `StateStoreNotAvailableException` are fatal exception. > > > But `StateStoreNotAvailableException` is fatal exception about state > > store > > > related. > > > I think it would be helpful that if user need to distinguish these two > > > different case to handle it. > > > > > > I'm not very sure, does that make sense? > > > > > > > > > --- > > > Vito > > > > > > > > > On Fri, Jan 10, 2020 at 3:35 AM Vinoth Chandar > > wrote: > > > > > >> +1 on merging `StreamsNotRunningException` and > > >> `StateStoreNotAvailableException`, both exceptions are fatal anyway. IMO > > >> its best to have these exceptions be about the state store (and not > > streams > > >> state), to easier understanding. > > >> > > >> Additionally, KIP-535 allows for querying of state stores in rebalancing > > >> state. So do we need the StreamsRebalancingException? > > >> > > >> > > >> On 2020/01/09 03:38:11, "Matthias J. Sax" > > wrote: > > >>> Sorry that I dropped the ball on this... > > >>> > > >>> Thanks for updating the KIP. Overall LGTM now. Feel free to start a > > VOTE > > >>> thread. > > >>> > > >>> What is still unclear to me is, what we gain by having both > > >>> `StreamsNotRunningException` and `StateStoreNotAvailableException`. > > Both > > >>> exception are thrown when KafkaStreams is in state PENDING_SHUTDOWN / > > >>> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know if the > > >>> state store is closed on not -- I can't query it anyway? Maybe I miss > > >>> something thought? > > >>> > > >>> > > >>> -Matthias > > >>> > > >>> > > >>> On 11/3/19 6:07 PM, Vito Jeng wrote: > > >>>> Sorry for the late reply, thanks for the review. > > >>>> > > >>>> > > >>>>> About `StateStoreMigratedException`: > > >>>>> > > >>>>> Why is it only thrown if the state is REBALANCING? A store might be > > >>>>> migrated during a rebalance, and Kafka Streams might resume back to > > >>>>> RUNNING state and afterward somebody tries to use an old store > > handle. > > >>>>> Also, if state is REBALANCING, should we throw > > >>>>> `StreamThreadRebalancingException`? Hence, I think > > >>>>> `StateStoreMigratedException` does only make sense during `RUNNING` > > >> state. > > >>>>> > > >>>> > > >>>> Thank you point this, already updated. > > >>>> > > >>>> > > >>>> Why do we need to distinguish between > > `KafkaStreamsNotRunningException` > > >>>>> and `StateStoreNotAvailableException`? > > >>>>> > > >>>> > > >>>> `KafkaStreamsNotRunningException` may be caused by various reasons, I > > >> think > > >>>> it would be helpful that the > > >>>> user can distinguish whether it is caused by the state store closed. > > >>>> (Maybe I am wrong...) > > >>>> > > >>>> > > >>>> Last, why do we distinguish between `KafkaStreams` instance and > > >>>>> `StreamsThread`? To me, it seems we should always refer to the > > >> instance, > > >>>>> because that is the level of granularity in which we enable/disable > > >> IQ atm. > > >>>>> > > >>>> > > >>>> Totally agree. Do you mean the naming of state store exceptions? > > >>>> I don't have special reason to distinguish these two. > > >>>> Your suggestion look more reasonable for the exception naming. > > >>>> > > >>>> > > >>>> Last, for `StateStoreMigratedException`, I would add that a user need > > >> to > > >>>>> rediscover the store and cannot blindly retry as the store handle is > > >>>>> invalid and a new store handle must be retrieved. That is a > > difference > > >>>>> to `StreamThreadRebalancingException` that allows for "blind" retries > > >>>>> that either resolve (if the store is still on the same instance after > > >>>>> rebalancing finishes, or changes to `StateStoreMigratedException` if > > >> the > > >>>>> store was migrated away during rebalancing). > > >>>>> > > >>>> > > >>>> Nice, it's great! Thank you. > > >>>> > > >>>> > > >>>> The KIP already updated, please take a look. :) > > >>>> > > >>>> > > >>>> > > >>>> On Wed, Oct 23, 2019 at 1:48 PM Matthias J. Sax < > > matthias@confluent.io > > >>> > > >>>> wrote: > > >>>> > > >>>>> Any update on this KIP? > > >>>>> > > >>>>> On 10/7/19 3:35 PM, Matthias J. Sax wrote: > > >>>>>> Sorry for the late reply. The 2.4 deadline kept us quite busy. > > >>>>>> > > >>>>>> About `StateStoreMigratedException`: > > >>>>>> > > >>>>>> Why is it only thrown if the state is REBALANCING? A store might be > > >>>>>> migrated during a rebalance, and Kafka Streams might resume back to > > >>>>>> RUNNING state and afterward somebody tries to use an old store > > >> handle. > > >>>>>> Also, if state is REBALANCING, should we throw > > >>>>>> `StreamThreadRebalancingException`? Hence, I think > > >>>>>> `StateStoreMigratedException` does only make sense during `RUNNING` > > >>>>> state. > > >>>>>> > > >>>>>> > > >>>>>> Why do we need to distinguish between > > >> `KafkaStreamsNotRunningException` > > >>>>>> and `StateStoreNotAvailableException`? > > >>>>>> > > >>>>>> > > >>>>>> Last, why do we distinguish between `KafkaStreams` instance and > > >>>>>> `StreamsThread`? To me, it seems we should always refer to the > > >> instance, > > >>>>>> because that is the level of granularity in which we enable/disable > > >> IQ > > >>>>> atm. > > >>>>>> > > >>>>>> > > >>>>>> Last, for `StateStoreMigratedException`, I would add that a user > > >> need to > > >>>>>> rediscover the store and cannot blindly retry as the store handle is > > >>>>>> invalid and a new store handle must be retrieved. That is a > > >> difference > > >>>>>> to `StreamThreadRebalancingException` that allows for "blind" > > retries > > >>>>>> that either resolve (if the store is still on the same instance > > after > > >>>>>> rebalancing finishes, or changes to `StateStoreMigratedException` if > > >> the > > >>>>>> store was migrated away during rebalancing). > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> -Matthias > > >>>>>> > > >>>>>> On 8/9/19 10:20 AM, Vito Jeng wrote: > > >>>>>>> My bad. The short link `https://shorturl.at/CDNT9` > > > > >> > > >>>>> > > >>>>>>> seems incorrect. > > >>>>>>> > > >>>>>>> Please use the following instead: https://shorturl.at/bkKQU > > >>>>>>> > > >>>>>>> > > >>>>>>> --- > > >>>>>>> Vito > > >>>>>>> > > >>>>>>> > > >>>>>>> On Fri, Aug 9, 2019 at 10:53 AM Vito Jeng > > >> wrote: > > >>>>>>> > > >>>>>>>> Thanks, Matthias! > > >>>>>>>> > > >>>>>>>>> About `StreamThreadNotStartedException`: > > >>>>>>>> > > >>>>>>>> Thank you for explanation. I agree with your opinion. > > >>>>>>>> `CompositeReadOnlyXxxStore#get()` would never throw > > >>>>>>>> `StreamThreadNotStartedException`. > > >>>>>>>> > > >>>>>>>> For the case that corresponding thread crashes after we handed out > > >> the > > >>>>>>>> store handle. We may throw `KafkaStreamsNotRunningException` or > > >>>>>>>> `StateStoreMigratedException`. > > >>>>>>>> In `StreamThreadStateStoreProvider`, we would throw > > >>>>>>>> `KafkaStreamsNotRunningException` when stream thread is not > > >> running( > > >>>>>>>> https://shorturl.at/CDNT9) or throw `StateStoreMigratedException` > > >> when > > >>>>>>>> store is closed(https://shorturl.at/hrvAN). So I think we do not > > >> need > > >>>>> to > > >>>>>>>> add a new type for this case. Does that make sense? > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> About `KafkaStreamsNotRunningException` vs > > >>>>>>>> `StreamThreadNotRunningException`: > > >>>>>>>> > > >>>>>>>> I understand your point. I rename > > >> `StreamThreadNotRunningException` to > > >>>>>>>> `KafkaStreamsNotRunningException`. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> About check unknown state store names: > > >>>>>>>> Thank you for the hint. I add a new type > > >> `UnknownStateStoreException` > > >>>>> for > > >>>>>>>> this case. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> Also, we should still have fatal exception > > >>>>>>>> `StateStoreNotAvailableException`? Not sure why you remove it? > > >>>>>>>> > > >>>>>>>> Thank you point this, already add it again. > > >>>>>>>> > > >>>>>>>> The KIP already updated, please take a look. > > >>>>>>>> > > >>>>>>>> --- > > >>>>>>>> Vito > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>>> > > >>>> > > >>> > > >>> > > >> > > > > > > > >