kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Roesler" <vvcep...@apache.org>
Subject Re: [DISCUSS]KIP-216: IQ should throw different exceptions for different errors
Date Thu, 16 Jan 2020 04:30:52 GMT
Hi Vito,

Haha, your archive game is on point!

What Matthias said in that email is essentially what I figured was the rationale. It makes
sense, but the point I was making is that this really doesn’t seem like a good way to structure
a production app. On the other hand, considering the exception fatal has a good chance of
avoiding a frustrating debug session if you just forgot to call start. 

Nevertheless, if we omit the categorization, it’s moot.

It would be easy to add a categorization layer later if we want it, but not very easy to change
it if we get it wrong. 

Thanks for your consideration!
-John

On Wed, Jan 15, 2020, at 21:14, Vito Jeng wrote:
> Hi John,
> 
> About `StreamsNotStartedException is strange` --
> The original idea came from Matthias, two years ago. :)
> You can reference here:
> https://mail-archives.apache.org/mod_mbox/kafka-dev/201806.mbox/%3c6c32083e-b63c-435b-521d-032d45cc518f@confluent.io%3e
> 
> About omitting the categorization --
> It looks reasonable. I'm fine with omitting the categorization but not very
> sure it is a good choice.
> Does any other folks provide opinion?
> 
> 
> Hi, folks,
> 
> Just update the KIP-216, please take a look.
> 
> ---
> Vito
> 
> 
> On Thu, Jan 16, 2020 at 6:35 AM Vito Jeng <vito@is-land.com.tw> wrote:
> 
> >
> > Hi, folks,
> >
> > Thank you suggestion, really appreciate it. :)
> > I understand your concern. I'll merge StreamsNotRunningException and
> > StateStoreNotAvailableException.
> >
> >
> > ---
> > Vito
> >
> >
> > On Thu, Jan 16, 2020 at 6:22 AM John Roesler <vvcephei@apache.org> wrote:
> >
> >> Hey Vito,
> >>
> >> Yes, thanks for the KIP. Sorry the discussion has been so long.
> >> Hopefully, we can close it out soon.
> >>
> >> I agree we can drop StreamsNotRunningException in favor of
> >> just StateStoreNotAvailableException.
> >>
> >> Unfortunately, I have some higher-level concerns. The value
> >> of these exceptions is that they tell you how to handle the
> >> various situations that can arise while querying a distributed
> >> data store.
> >>
> >> Ideally, as a caller, I should be able to just catch "retriable" or
> >> "fatal" and handle them appropriately. Otherwise, there's no
> >> point in having categories, and we should just have all the
> >> exceptions extend InvalidStateStoreException.
> >>
> >> Presently, it's not possible to tell from just the
> >> "retriable"/"fatal" distinction what to do. You  can tell
> >> from the descriptions of the various exceptions. E.g.:
> >>
> >> Retriable:
> >>  * StreamsRebalancingException: the exact same call
> >>     should just be retried until the rebalance is complete
> >>  * StateStoreMigratedException: the store handle is
> >>     now invalid, so you need to re-discover the instance
> >>     and get a new handle on that instance. In other words,
> >>     the query itself may be valid, but the particular method
> >>     invocation on this particular instance has encountered
> >>     a fatal exception.
> >>
> >> Fatal:
> >>  * UnknownStateStoreException: this is truly fatal. No amount
> >>     of retrying or re-discovering is going to get you a handle on a
> >>     store that doesn't exist in the cluster.
> >>  * StateStoreNotAvailableException: this is actually recoverable,
> >>     since the store might exist in the cluster, but isn't available on
> >>     this particular instance (which is shut down or whatever).
> >>
> >> Personally, I'm not a fan of code bureaucracy, so I'm 100% fine
> >> with omitting the categorization and just having 5 subclasses
> >> of InvalidStateStoreException. Each of them would tell you
> >> how to handle them, and it's not too many to really
> >> understand and handle each one.
> >>
> >> If you really want to have a middle tier, I'd recommend:
> >> * RetryableStateStoreException: the exact same call
> >>     should be repeated.
> >> * RecoverableStateStoreException: the store handle
> >>     should be discarded and the caller should re-discover
> >>     the location of the store and repeat the query on the
> >>     correct instance.
> >> * FatalStateStoreException: the query/request is totally
> >>     invalid and will never succeed.
> >>
> >> However, attempting to categorize the proposed exceptions
> >> reveals even problems with this categorization:
> >> Retriable:
> >> * StreamsRebalancingException
> >> Recoverable:
> >> * StateStoreMigratedException
> >> * StreamsNotRunningException
> >> Fatal:
> >> * UnknownStateStoreException
> >>
> >> But StreamsNotStartedException is strange... It means that
> >> one code path got a handle on a specific KafkaStreams object
> >> instance and sent it a query before another code path
> >> invoked the start() method on the exact same object instance.
> >> It seems like the most likely scenario is that whoever wrote
> >> the program just forgot to call start() before querying, in
> >> which case, retrying isn't going to help, and a fatal exception
> >> is more appropriate. I.e., it sounds like a "first 15 minutes
> >> experience" problem, and making it fatal would be more
> >> helpful. Even in a production context, there's no reason not
> >> to sequence your application startup such that you don't
> >> accept queries until after Streams is started. Thus, I guess
> >> I'd categorize it under "fatal".
> >>
> >> Regardless of whether you make it fatal or retriable, you'd
> >> still have a whole category with only one exception in it,
> >> and the other two categories only have two exceptions.
> >> Plus, as you pointed out in the KIP, you can't get all
> >> exceptions in all cases anyway:
> >> * store() can only throw NotStarted, NotRunning,
> >>     and Unknown
> >> * actual store queries can only throw Rebalancing,
> >>     Migrated, and NotRunning
> >>
> >> Thus, in practice also, there are exactly three categories
> >> and also exactly three exception types. It doesn't seem
> >> like there's a great advantage to the categories here. To
> >> avoid the categorization problem and also to clarify what
> >> exceptions can actually be thrown in different circumstances,
> >> it seems like we should just:
> >> * get rid of the middle tier and make all the exceptions
> >>     extend InvalidStateStoreException
> >> * drop StateStoreNotAvailableException in favor of
> >>     StreamsNotRunningException
> >> * clearly document on all public methods which exceptions
> >>     need to be handled
> >>
> >> How do you feel about this?
> >> Thanks,
> >> -John
> >>
> >> On Wed, Jan 15, 2020, at 15:13, Bill Bejeck wrote:
> >> > Thanks for KIP Vito.
> >> >
> >> > Overall the KIP LGTM, but I'd have to agree with others on merging the
> >> > `StreamsNotRunningException` and `StateStoreNotAvailableException`
> >> classes.
> >> >
> >> > Since in both cases, the thread state is in `PENDING_SHUTDOWN ||
> >> > NOT_RUNNING || ERROR` I'm not even sure how we could distinguish when to
> >> > use the different
> >> > exceptions.  Maybe a good middle ground would be to have a detailed
> >> > exception message.
> >> >
> >> > The KIP freeze is close, so I think if we can agree on this, we can
> >> wrap up
> >> > the voting soon.
> >> >
> >> > Thanks,
> >> > Bill
> >> >
> >> > On Tue, Jan 14, 2020 at 2:12 PM Matthias J. Sax <matthias@confluent.io>
> >> > wrote:
> >> >
> >> > > Vito,
> >> > >
> >> > > It's still unclear to me what the advantage is, to have both
> >> > > `StreamsNotRunningException` and `StateStoreNotAvailableException`?
> >> > >
> >> > > For both cased, the state is `PENDING_SHUTDOWN / NOT_RUNNING / ERROR`
> >> > > and thus, for a user point of view, why does it matter if the store
is
> >> > > closed on not? I don't understand why/how this information would be
> >> > > useful? Do you have a concrete example in mind how a user would react
> >> > > differently to both exceptions?
> >> > >
> >> > >
> >> > > @Vinoth: about `StreamsRebalancingException` -- to me, it seems best
> >> to
> >> > > actually do this on a per-query basis, ie, have an overload
> >> > > `KafkaStreams#store(...)` that takes a boolean flag that allow to
> >> > > _disable_ the exception and opt-in to query a active store during
> >> > > recovery. However, as KIP-535 actually introduces this change in
> >> > > behavior, I think KIP-216 should not cover this, but KIP-535 should
be
> >> > > updated. I'll follow up on the other KIP thread to raise this point.
> >> > >
> >> > >
> >> > > -Matthias
> >> > >
> >> > > On 1/11/20 12:26 AM, Vito Jeng wrote:
> >> > > > Hi, Matthias & Vinoth,
> >> > > >
> >> > > > Thanks for the feedback.
> >> > > >
> >> > > >> What is still unclear to me is, what we gain by having both
> >> > > >> `StreamsNotRunningException` and
> >> `StateStoreNotAvailableException`. Both
> >> > > >> exception are thrown when KafkaStreams is in state
> >> PENDING_SHUTDOWN /
> >> > > >> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know
if the
> >> > > >> state store is closed on not -- I can't query it anyway?
Maybe I
> >> miss
> >> > > >> something thought?
> >> > > >
> >> > > > Yes, both `StreamsNotRunningException` and
> >> > > > `StateStoreNotAvailableException` are fatal exception.
> >> > > > But `StateStoreNotAvailableException` is fatal exception about
state
> >> > > store
> >> > > > related.
> >> > > > I think it would be helpful that if user need to distinguish
these
> >> two
> >> > > > different case to handle it.
> >> > > >
> >> > > > I'm not very sure, does that make sense?
> >> > > >
> >> > > >
> >> > > > ---
> >> > > > Vito
> >> > > >
> >> > > >
> >> > > > On Fri, Jan 10, 2020 at 3:35 AM Vinoth Chandar <vinoth@apache.org>
> >> > > wrote:
> >> > > >
> >> > > >> +1 on merging `StreamsNotRunningException` and
> >> > > >> `StateStoreNotAvailableException`, both exceptions are fatal
> >> anyway. IMO
> >> > > >> its best to have these exceptions be about the state store
(and not
> >> > > streams
> >> > > >> state), to easier understanding.
> >> > > >>
> >> > > >> Additionally, KIP-535 allows for querying of state stores
in
> >> rebalancing
> >> > > >> state. So do we need the StreamsRebalancingException?
> >> > > >>
> >> > > >>
> >> > > >> On 2020/01/09 03:38:11, "Matthias J. Sax" <matthias@confluent.io>
> >> > > wrote:
> >> > > >>> Sorry that I dropped the ball on this...
> >> > > >>>
> >> > > >>> Thanks for updating the KIP. Overall LGTM now. Feel free
to start
> >> a
> >> > > VOTE
> >> > > >>> thread.
> >> > > >>>
> >> > > >>> What is still unclear to me is, what we gain by having
both
> >> > > >>> `StreamsNotRunningException` and
> >> `StateStoreNotAvailableException`.
> >> > > Both
> >> > > >>> exception are thrown when KafkaStreams is in state
> >> PENDING_SHUTDOWN /
> >> > > >>> NOT_RUNNING / ERROR. Hence, as a user what do I gain
to know if
> >> the
> >> > > >>> state store is closed on not -- I can't query it anyway?
Maybe I
> >> miss
> >> > > >>> something thought?
> >> > > >>>
> >> > > >>>
> >> > > >>> -Matthias
> >> > > >>>
> >> > > >>>
> >> > > >>> On 11/3/19 6:07 PM, Vito Jeng wrote:
> >> > > >>>> Sorry for the late reply, thanks for the review.
> >> > > >>>>
> >> > > >>>>
> >> > > >>>>> About `StateStoreMigratedException`:
> >> > > >>>>>
> >> > > >>>>> Why is it only thrown if the state is REBALANCING?
A store
> >> might be
> >> > > >>>>> migrated during a rebalance, and Kafka Streams
might resume
> >> back to
> >> > > >>>>> RUNNING state and afterward somebody tries to
use an old store
> >> > > handle.
> >> > > >>>>> Also, if state is REBALANCING, should we throw
> >> > > >>>>> `StreamThreadRebalancingException`? Hence, I
think
> >> > > >>>>> `StateStoreMigratedException` does only make
sense during
> >> `RUNNING`
> >> > > >> state.
> >> > > >>>>>
> >> > > >>>>
> >> > > >>>> Thank you point this, already updated.
> >> > > >>>>
> >> > > >>>>
> >> > > >>>> Why do we need to distinguish between
> >> > > `KafkaStreamsNotRunningException`
> >> > > >>>>> and `StateStoreNotAvailableException`?
> >> > > >>>>>
> >> > > >>>>
> >> > > >>>> `KafkaStreamsNotRunningException` may be caused by
various
> >> reasons, I
> >> > > >> think
> >> > > >>>> it would be helpful that the
> >> > > >>>> user can distinguish whether it is caused by the
state store
> >> closed.
> >> > > >>>> (Maybe I am wrong...)
> >> > > >>>>
> >> > > >>>>
> >> > > >>>> Last, why do we distinguish between `KafkaStreams`
instance and
> >> > > >>>>> `StreamsThread`? To me, it seems we should always
refer to the
> >> > > >> instance,
> >> > > >>>>> because that is the level of granularity in which
we
> >> enable/disable
> >> > > >> IQ atm.
> >> > > >>>>>
> >> > > >>>>
> >> > > >>>> Totally agree. Do you mean the naming of state store
exceptions?
> >> > > >>>> I don't have special reason to distinguish these
two.
> >> > > >>>> Your suggestion look more reasonable for the exception
naming.
> >> > > >>>>
> >> > > >>>>
> >> > > >>>> Last, for `StateStoreMigratedException`, I would
add that a user
> >> need
> >> > > >> to
> >> > > >>>>> rediscover the store and cannot blindly retry
as the store
> >> handle is
> >> > > >>>>> invalid and a new store handle must be retrieved.
That is a
> >> > > difference
> >> > > >>>>> to `StreamThreadRebalancingException` that allows
for "blind"
> >> retries
> >> > > >>>>> that either resolve (if the store is still on
the same instance
> >> after
> >> > > >>>>> rebalancing finishes, or changes to
> >> `StateStoreMigratedException` if
> >> > > >> the
> >> > > >>>>> store was migrated away during rebalancing).
> >> > > >>>>>
> >> > > >>>>
> >> > > >>>> Nice, it's great! Thank you.
> >> > > >>>>
> >> > > >>>>
> >> > > >>>> The KIP already updated, please take a look. :)
> >> > > >>>>
> >> > > >>>>
> >> > > >>>>
> >> > > >>>> On Wed, Oct 23, 2019 at 1:48 PM Matthias J. Sax <
> >> > > matthias@confluent.io
> >> > > >>>
> >> > > >>>> wrote:
> >> > > >>>>
> >> > > >>>>> Any update on this KIP?
> >> > > >>>>>
> >> > > >>>>> On 10/7/19 3:35 PM, Matthias J. Sax wrote:
> >> > > >>>>>> Sorry for the late reply. The 2.4 deadline
kept us quite busy.
> >> > > >>>>>>
> >> > > >>>>>> About `StateStoreMigratedException`:
> >> > > >>>>>>
> >> > > >>>>>> Why is it only thrown if the state is REBALANCING?
A store
> >> might be
> >> > > >>>>>> migrated during a rebalance, and Kafka Streams
might resume
> >> back to
> >> > > >>>>>> RUNNING state and afterward somebody tries
to use an old store
> >> > > >> handle.
> >> > > >>>>>> Also, if state is REBALANCING, should we
throw
> >> > > >>>>>> `StreamThreadRebalancingException`? Hence,
I think
> >> > > >>>>>> `StateStoreMigratedException` does only make
sense during
> >> `RUNNING`
> >> > > >>>>> state.
> >> > > >>>>>>
> >> > > >>>>>>
> >> > > >>>>>> Why do we need to distinguish between
> >> > > >> `KafkaStreamsNotRunningException`
> >> > > >>>>>> and `StateStoreNotAvailableException`?
> >> > > >>>>>>
> >> > > >>>>>>
> >> > > >>>>>> Last, why do we distinguish between `KafkaStreams`
instance and
> >> > > >>>>>> `StreamsThread`? To me, it seems we should
always refer to the
> >> > > >> instance,
> >> > > >>>>>> because that is the level of granularity
in which we
> >> enable/disable
> >> > > >> IQ
> >> > > >>>>> atm.
> >> > > >>>>>>
> >> > > >>>>>>
> >> > > >>>>>> Last, for `StateStoreMigratedException`,
I would add that a
> >> user
> >> > > >> need to
> >> > > >>>>>> rediscover the store and cannot blindly retry
as the store
> >> handle is
> >> > > >>>>>> invalid and a new store handle must be retrieved.
That is a
> >> > > >> difference
> >> > > >>>>>> to `StreamThreadRebalancingException` that
allows for "blind"
> >> > > retries
> >> > > >>>>>> that either resolve (if the store is still
on the same instance
> >> > > after
> >> > > >>>>>> rebalancing finishes, or changes to
> >> `StateStoreMigratedException` if
> >> > > >> the
> >> > > >>>>>> store was migrated away during rebalancing).
> >> > > >>>>>>
> >> > > >>>>>>
> >> > > >>>>>>
> >> > > >>>>>> -Matthias
> >> > > >>>>>>
> >> > > >>>>>> On 8/9/19 10:20 AM, Vito Jeng wrote:
> >> > > >>>>>>> My bad. The short link `https://shorturl.at/CDNT9`
> >> <https://shorturl.at/CDNT9>
> >> > > <https://shorturl.at/CDNT9>
> >> > > >> <https://shorturl.at/CDNT9>
> >> > > >>>>> <https://shorturl.at/CDNT9>
> >> > > >>>>>>> <https://shorturl.at/CDNT9> seems
incorrect.
> >> > > >>>>>>>
> >> > > >>>>>>> Please use the following instead: https://shorturl.at/bkKQU
> >> > > >>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>> ---
> >> > > >>>>>>> Vito
> >> > > >>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>> On Fri, Aug 9, 2019 at 10:53 AM Vito
Jeng <
> >> vito@is-land.com.tw>
> >> > > >> wrote:
> >> > > >>>>>>>
> >> > > >>>>>>>> Thanks, Matthias!
> >> > > >>>>>>>>
> >> > > >>>>>>>>> About `StreamThreadNotStartedException`:
> >> > > >>>>>>>>
> >> > > >>>>>>>> Thank you for explanation. I agree
with your opinion.
> >> > > >>>>>>>> `CompositeReadOnlyXxxStore#get()`
would never throw
> >> > > >>>>>>>> `StreamThreadNotStartedException`.
> >> > > >>>>>>>>
> >> > > >>>>>>>> For the case that corresponding thread
crashes after we
> >> handed out
> >> > > >> the
> >> > > >>>>>>>> store handle. We may throw `KafkaStreamsNotRunningException`
> >> or
> >> > > >>>>>>>> `StateStoreMigratedException`.
> >> > > >>>>>>>> In `StreamThreadStateStoreProvider`,
we would throw
> >> > > >>>>>>>> `KafkaStreamsNotRunningException`
when stream thread is not
> >> > > >> running(
> >> > > >>>>>>>> https://shorturl.at/CDNT9) or throw
> >> `StateStoreMigratedException`
> >> > > >> when
> >> > > >>>>>>>> store is closed(https://shorturl.at/hrvAN).
So I think we
> >> do not
> >> > > >> need
> >> > > >>>>> to
> >> > > >>>>>>>> add a new type for this case. Does
that make sense?
> >> > > >>>>>>>>
> >> > > >>>>>>>>
> >> > > >>>>>>>>> About `KafkaStreamsNotRunningException`
vs
> >> > > >>>>>>>> `StreamThreadNotRunningException`:
> >> > > >>>>>>>>
> >> > > >>>>>>>> I understand your point. I rename
> >> > > >> `StreamThreadNotRunningException` to
> >> > > >>>>>>>> `KafkaStreamsNotRunningException`.
> >> > > >>>>>>>>
> >> > > >>>>>>>>
> >> > > >>>>>>>> About check unknown state store names:
> >> > > >>>>>>>> Thank you for the hint. I add a new
type
> >> > > >> `UnknownStateStoreException`
> >> > > >>>>> for
> >> > > >>>>>>>> this case.
> >> > > >>>>>>>>
> >> > > >>>>>>>>
> >> > > >>>>>>>>> Also, we should still have fatal
exception
> >> > > >>>>>>>> `StateStoreNotAvailableException`?
Not sure why you remove
> >> it?
> >> > > >>>>>>>>
> >> > > >>>>>>>> Thank you point this, already add
it again.
> >> > > >>>>>>>>
> >> > > >>>>>>>> The KIP already updated, please take
a look.
> >> > > >>>>>>>>
> >> > > >>>>>>>> ---
> >> > > >>>>>>>> Vito
> >> > > >>>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>
> >> > > >>>>>
> >> > > >>>>>
> >> > > >>>>
> >> > > >>>
> >> > > >>>
> >> > > >>
> >> > > >
> >> > >
> >> > >
> >> >
> >>
> >
>

Mime
View raw message