kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vito Jeng <v...@is-land.com.tw>
Subject Re: [DISCUSS]KIP-216: IQ should throw different exceptions for different errors
Date Wed, 15 Jan 2020 22:35:08 GMT
Hi, folks,

Thank you suggestion, really appreciate it. :)
I understand your concern. I'll merge StreamsNotRunningException and
StateStoreNotAvailableException.


---
Vito


On Thu, Jan 16, 2020 at 6:22 AM John Roesler <vvcephei@apache.org> wrote:

> Hey Vito,
>
> Yes, thanks for the KIP. Sorry the discussion has been so long.
> Hopefully, we can close it out soon.
>
> I agree we can drop StreamsNotRunningException in favor of
> just StateStoreNotAvailableException.
>
> Unfortunately, I have some higher-level concerns. The value
> of these exceptions is that they tell you how to handle the
> various situations that can arise while querying a distributed
> data store.
>
> Ideally, as a caller, I should be able to just catch "retriable" or
> "fatal" and handle them appropriately. Otherwise, there's no
> point in having categories, and we should just have all the
> exceptions extend InvalidStateStoreException.
>
> Presently, it's not possible to tell from just the
> "retriable"/"fatal" distinction what to do. You  can tell
> from the descriptions of the various exceptions. E.g.:
>
> Retriable:
>  * StreamsRebalancingException: the exact same call
>     should just be retried until the rebalance is complete
>  * StateStoreMigratedException: the store handle is
>     now invalid, so you need to re-discover the instance
>     and get a new handle on that instance. In other words,
>     the query itself may be valid, but the particular method
>     invocation on this particular instance has encountered
>     a fatal exception.
>
> Fatal:
>  * UnknownStateStoreException: this is truly fatal. No amount
>     of retrying or re-discovering is going to get you a handle on a
>     store that doesn't exist in the cluster.
>  * StateStoreNotAvailableException: this is actually recoverable,
>     since the store might exist in the cluster, but isn't available on
>     this particular instance (which is shut down or whatever).
>
> Personally, I'm not a fan of code bureaucracy, so I'm 100% fine
> with omitting the categorization and just having 5 subclasses
> of InvalidStateStoreException. Each of them would tell you
> how to handle them, and it's not too many to really
> understand and handle each one.
>
> If you really want to have a middle tier, I'd recommend:
> * RetryableStateStoreException: the exact same call
>     should be repeated.
> * RecoverableStateStoreException: the store handle
>     should be discarded and the caller should re-discover
>     the location of the store and repeat the query on the
>     correct instance.
> * FatalStateStoreException: the query/request is totally
>     invalid and will never succeed.
>
> However, attempting to categorize the proposed exceptions
> reveals even problems with this categorization:
> Retriable:
> * StreamsRebalancingException
> Recoverable:
> * StateStoreMigratedException
> * StreamsNotRunningException
> Fatal:
> * UnknownStateStoreException
>
> But StreamsNotStartedException is strange... It means that
> one code path got a handle on a specific KafkaStreams object
> instance and sent it a query before another code path
> invoked the start() method on the exact same object instance.
> It seems like the most likely scenario is that whoever wrote
> the program just forgot to call start() before querying, in
> which case, retrying isn't going to help, and a fatal exception
> is more appropriate. I.e., it sounds like a "first 15 minutes
> experience" problem, and making it fatal would be more
> helpful. Even in a production context, there's no reason not
> to sequence your application startup such that you don't
> accept queries until after Streams is started. Thus, I guess
> I'd categorize it under "fatal".
>
> Regardless of whether you make it fatal or retriable, you'd
> still have a whole category with only one exception in it,
> and the other two categories only have two exceptions.
> Plus, as you pointed out in the KIP, you can't get all
> exceptions in all cases anyway:
> * store() can only throw NotStarted, NotRunning,
>     and Unknown
> * actual store queries can only throw Rebalancing,
>     Migrated, and NotRunning
>
> Thus, in practice also, there are exactly three categories
> and also exactly three exception types. It doesn't seem
> like there's a great advantage to the categories here. To
> avoid the categorization problem and also to clarify what
> exceptions can actually be thrown in different circumstances,
> it seems like we should just:
> * get rid of the middle tier and make all the exceptions
>     extend InvalidStateStoreException
> * drop StateStoreNotAvailableException in favor of
>     StreamsNotRunningException
> * clearly document on all public methods which exceptions
>     need to be handled
>
> How do you feel about this?
> Thanks,
> -John
>
> On Wed, Jan 15, 2020, at 15:13, Bill Bejeck wrote:
> > Thanks for KIP Vito.
> >
> > Overall the KIP LGTM, but I'd have to agree with others on merging the
> > `StreamsNotRunningException` and `StateStoreNotAvailableException`
> classes.
> >
> > Since in both cases, the thread state is in `PENDING_SHUTDOWN ||
> > NOT_RUNNING || ERROR` I'm not even sure how we could distinguish when to
> > use the different
> > exceptions.  Maybe a good middle ground would be to have a detailed
> > exception message.
> >
> > The KIP freeze is close, so I think if we can agree on this, we can wrap
> up
> > the voting soon.
> >
> > Thanks,
> > Bill
> >
> > On Tue, Jan 14, 2020 at 2:12 PM Matthias J. Sax <matthias@confluent.io>
> > wrote:
> >
> > > Vito,
> > >
> > > It's still unclear to me what the advantage is, to have both
> > > `StreamsNotRunningException` and `StateStoreNotAvailableException`?
> > >
> > > For both cased, the state is `PENDING_SHUTDOWN / NOT_RUNNING / ERROR`
> > > and thus, for a user point of view, why does it matter if the store is
> > > closed on not? I don't understand why/how this information would be
> > > useful? Do you have a concrete example in mind how a user would react
> > > differently to both exceptions?
> > >
> > >
> > > @Vinoth: about `StreamsRebalancingException` -- to me, it seems best to
> > > actually do this on a per-query basis, ie, have an overload
> > > `KafkaStreams#store(...)` that takes a boolean flag that allow to
> > > _disable_ the exception and opt-in to query a active store during
> > > recovery. However, as KIP-535 actually introduces this change in
> > > behavior, I think KIP-216 should not cover this, but KIP-535 should be
> > > updated. I'll follow up on the other KIP thread to raise this point.
> > >
> > >
> > > -Matthias
> > >
> > > On 1/11/20 12:26 AM, Vito Jeng wrote:
> > > > Hi, Matthias & Vinoth,
> > > >
> > > > Thanks for the feedback.
> > > >
> > > >> What is still unclear to me is, what we gain by having both
> > > >> `StreamsNotRunningException` and `StateStoreNotAvailableException`.
> Both
> > > >> exception are thrown when KafkaStreams is in state PENDING_SHUTDOWN
> /
> > > >> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know if the
> > > >> state store is closed on not -- I can't query it anyway? Maybe I
> miss
> > > >> something thought?
> > > >
> > > > Yes, both `StreamsNotRunningException` and
> > > > `StateStoreNotAvailableException` are fatal exception.
> > > > But `StateStoreNotAvailableException` is fatal exception about state
> > > store
> > > > related.
> > > > I think it would be helpful that if user need to distinguish these
> two
> > > > different case to handle it.
> > > >
> > > > I'm not very sure, does that make sense?
> > > >
> > > >
> > > > ---
> > > > Vito
> > > >
> > > >
> > > > On Fri, Jan 10, 2020 at 3:35 AM Vinoth Chandar <vinoth@apache.org>
> > > wrote:
> > > >
> > > >> +1 on merging `StreamsNotRunningException` and
> > > >> `StateStoreNotAvailableException`, both exceptions are fatal
> anyway. IMO
> > > >> its best to have these exceptions be about the state store (and not
> > > streams
> > > >> state), to easier understanding.
> > > >>
> > > >> Additionally, KIP-535 allows for querying of state stores in
> rebalancing
> > > >> state. So do we need the StreamsRebalancingException?
> > > >>
> > > >>
> > > >> On 2020/01/09 03:38:11, "Matthias J. Sax" <matthias@confluent.io>
> > > wrote:
> > > >>> Sorry that I dropped the ball on this...
> > > >>>
> > > >>> Thanks for updating the KIP. Overall LGTM now. Feel free to start
a
> > > VOTE
> > > >>> thread.
> > > >>>
> > > >>> What is still unclear to me is, what we gain by having both
> > > >>> `StreamsNotRunningException` and `StateStoreNotAvailableException`.
> > > Both
> > > >>> exception are thrown when KafkaStreams is in state
> PENDING_SHUTDOWN /
> > > >>> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know if
the
> > > >>> state store is closed on not -- I can't query it anyway? Maybe
I
> miss
> > > >>> something thought?
> > > >>>
> > > >>>
> > > >>> -Matthias
> > > >>>
> > > >>>
> > > >>> On 11/3/19 6:07 PM, Vito Jeng wrote:
> > > >>>> Sorry for the late reply, thanks for the review.
> > > >>>>
> > > >>>>
> > > >>>>> About `StateStoreMigratedException`:
> > > >>>>>
> > > >>>>> Why is it only thrown if the state is REBALANCING? A store
might
> be
> > > >>>>> migrated during a rebalance, and Kafka Streams might resume
back
> to
> > > >>>>> RUNNING state and afterward somebody tries to use an old
store
> > > handle.
> > > >>>>> Also, if state is REBALANCING, should we throw
> > > >>>>> `StreamThreadRebalancingException`? Hence, I think
> > > >>>>> `StateStoreMigratedException` does only make sense during
> `RUNNING`
> > > >> state.
> > > >>>>>
> > > >>>>
> > > >>>> Thank you point this, already updated.
> > > >>>>
> > > >>>>
> > > >>>> Why do we need to distinguish between
> > > `KafkaStreamsNotRunningException`
> > > >>>>> and `StateStoreNotAvailableException`?
> > > >>>>>
> > > >>>>
> > > >>>> `KafkaStreamsNotRunningException` may be caused by various
> reasons, I
> > > >> think
> > > >>>> it would be helpful that the
> > > >>>> user can distinguish whether it is caused by the state store
> closed.
> > > >>>> (Maybe I am wrong...)
> > > >>>>
> > > >>>>
> > > >>>> Last, why do we distinguish between `KafkaStreams` instance
and
> > > >>>>> `StreamsThread`? To me, it seems we should always refer
to the
> > > >> instance,
> > > >>>>> because that is the level of granularity in which we
> enable/disable
> > > >> IQ atm.
> > > >>>>>
> > > >>>>
> > > >>>> Totally agree. Do you mean the naming of state store exceptions?
> > > >>>> I don't have special reason to distinguish these two.
> > > >>>> Your suggestion look more reasonable for the exception naming.
> > > >>>>
> > > >>>>
> > > >>>> Last, for `StateStoreMigratedException`, I would add that
a user
> need
> > > >> to
> > > >>>>> rediscover the store and cannot blindly retry as the store
> handle is
> > > >>>>> invalid and a new store handle must be retrieved. That
is a
> > > difference
> > > >>>>> to `StreamThreadRebalancingException` that allows for
"blind"
> retries
> > > >>>>> that either resolve (if the store is still on the same
instance
> after
> > > >>>>> rebalancing finishes, or changes to
> `StateStoreMigratedException` if
> > > >> the
> > > >>>>> store was migrated away during rebalancing).
> > > >>>>>
> > > >>>>
> > > >>>> Nice, it's great! Thank you.
> > > >>>>
> > > >>>>
> > > >>>> The KIP already updated, please take a look. :)
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Wed, Oct 23, 2019 at 1:48 PM Matthias J. Sax <
> > > matthias@confluent.io
> > > >>>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Any update on this KIP?
> > > >>>>>
> > > >>>>> On 10/7/19 3:35 PM, Matthias J. Sax wrote:
> > > >>>>>> Sorry for the late reply. The 2.4 deadline kept us
quite busy.
> > > >>>>>>
> > > >>>>>> About `StateStoreMigratedException`:
> > > >>>>>>
> > > >>>>>> Why is it only thrown if the state is REBALANCING?
A store
> might be
> > > >>>>>> migrated during a rebalance, and Kafka Streams might
resume
> back to
> > > >>>>>> RUNNING state and afterward somebody tries to use
an old store
> > > >> handle.
> > > >>>>>> Also, if state is REBALANCING, should we throw
> > > >>>>>> `StreamThreadRebalancingException`? Hence, I think
> > > >>>>>> `StateStoreMigratedException` does only make sense
during
> `RUNNING`
> > > >>>>> state.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Why do we need to distinguish between
> > > >> `KafkaStreamsNotRunningException`
> > > >>>>>> and `StateStoreNotAvailableException`?
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Last, why do we distinguish between `KafkaStreams`
instance and
> > > >>>>>> `StreamsThread`? To me, it seems we should always
refer to the
> > > >> instance,
> > > >>>>>> because that is the level of granularity in which
we
> enable/disable
> > > >> IQ
> > > >>>>> atm.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Last, for `StateStoreMigratedException`, I would add
that a user
> > > >> need to
> > > >>>>>> rediscover the store and cannot blindly retry as the
store
> handle is
> > > >>>>>> invalid and a new store handle must be retrieved.
That is a
> > > >> difference
> > > >>>>>> to `StreamThreadRebalancingException` that allows
for "blind"
> > > retries
> > > >>>>>> that either resolve (if the store is still on the
same instance
> > > after
> > > >>>>>> rebalancing finishes, or changes to
> `StateStoreMigratedException` if
> > > >> the
> > > >>>>>> store was migrated away during rebalancing).
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> -Matthias
> > > >>>>>>
> > > >>>>>> On 8/9/19 10:20 AM, Vito Jeng wrote:
> > > >>>>>>> My bad. The short link `https://shorturl.at/CDNT9`
> <https://shorturl.at/CDNT9>
> > > <https://shorturl.at/CDNT9>
> > > >> <https://shorturl.at/CDNT9>
> > > >>>>> <https://shorturl.at/CDNT9>
> > > >>>>>>> <https://shorturl.at/CDNT9> seems incorrect.
> > > >>>>>>>
> > > >>>>>>> Please use the following instead: https://shorturl.at/bkKQU
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> ---
> > > >>>>>>> Vito
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Fri, Aug 9, 2019 at 10:53 AM Vito Jeng <vito@is-land.com.tw
> >
> > > >> wrote:
> > > >>>>>>>
> > > >>>>>>>> Thanks, Matthias!
> > > >>>>>>>>
> > > >>>>>>>>> About `StreamThreadNotStartedException`:
> > > >>>>>>>>
> > > >>>>>>>> Thank you for explanation. I agree with your
opinion.
> > > >>>>>>>> `CompositeReadOnlyXxxStore#get()` would never
throw
> > > >>>>>>>> `StreamThreadNotStartedException`.
> > > >>>>>>>>
> > > >>>>>>>> For the case that corresponding thread crashes
after we
> handed out
> > > >> the
> > > >>>>>>>> store handle. We may throw `KafkaStreamsNotRunningException`
> or
> > > >>>>>>>> `StateStoreMigratedException`.
> > > >>>>>>>> In `StreamThreadStateStoreProvider`, we would
throw
> > > >>>>>>>> `KafkaStreamsNotRunningException` when stream
thread is not
> > > >> running(
> > > >>>>>>>> https://shorturl.at/CDNT9) or throw
> `StateStoreMigratedException`
> > > >> when
> > > >>>>>>>> store is closed(https://shorturl.at/hrvAN).
So I think we do
> not
> > > >> need
> > > >>>>> to
> > > >>>>>>>> add a new type for this case. Does that make
sense?
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> About `KafkaStreamsNotRunningException`
vs
> > > >>>>>>>> `StreamThreadNotRunningException`:
> > > >>>>>>>>
> > > >>>>>>>> I understand your point. I rename
> > > >> `StreamThreadNotRunningException` to
> > > >>>>>>>> `KafkaStreamsNotRunningException`.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> About check unknown state store names:
> > > >>>>>>>> Thank you for the hint. I add a new type
> > > >> `UnknownStateStoreException`
> > > >>>>> for
> > > >>>>>>>> this case.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> Also, we should still have fatal exception
> > > >>>>>>>> `StateStoreNotAvailableException`? Not sure
why you remove it?
> > > >>>>>>>>
> > > >>>>>>>> Thank you point this, already add it again.
> > > >>>>>>>>
> > > >>>>>>>> The KIP already updated, please take a look.
> > > >>>>>>>>
> > > >>>>>>>> ---
> > > >>>>>>>> Vito
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message