kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Bejeck <bbej...@gmail.com>
Subject Re: [DISCUSS]KIP-216: IQ should throw different exceptions for different errors
Date Thu, 16 Jan 2020 14:38:23 GMT
Vito,

Thanks for the updates, the KIP LGTM.

-Bill

On Wed, Jan 15, 2020 at 11:31 PM John Roesler <vvcephei@apache.org> wrote:

> Hi Vito,
>
> Haha, your archive game is on point!
>
> What Matthias said in that email is essentially what I figured was the
> rationale. It makes sense, but the point I was making is that this really
> doesn’t seem like a good way to structure a production app. On the other
> hand, considering the exception fatal has a good chance of avoiding a
> frustrating debug session if you just forgot to call start.
>
> Nevertheless, if we omit the categorization, it’s moot.
>
> It would be easy to add a categorization layer later if we want it, but
> not very easy to change it if we get it wrong.
>
> Thanks for your consideration!
> -John
>
> On Wed, Jan 15, 2020, at 21:14, Vito Jeng wrote:
> > Hi John,
> >
> > About `StreamsNotStartedException is strange` --
> > The original idea came from Matthias, two years ago. :)
> > You can reference here:
> >
> https://mail-archives.apache.org/mod_mbox/kafka-dev/201806.mbox/%3c6c32083e-b63c-435b-521d-032d45cc518f@confluent.io%3e
> >
> > About omitting the categorization --
> > It looks reasonable. I'm fine with omitting the categorization but not
> very
> > sure it is a good choice.
> > Does any other folks provide opinion?
> >
> >
> > Hi, folks,
> >
> > Just update the KIP-216, please take a look.
> >
> > ---
> > Vito
> >
> >
> > On Thu, Jan 16, 2020 at 6:35 AM Vito Jeng <vito@is-land.com.tw> wrote:
> >
> > >
> > > Hi, folks,
> > >
> > > Thank you suggestion, really appreciate it. :)
> > > I understand your concern. I'll merge StreamsNotRunningException and
> > > StateStoreNotAvailableException.
> > >
> > >
> > > ---
> > > Vito
> > >
> > >
> > > On Thu, Jan 16, 2020 at 6:22 AM John Roesler <vvcephei@apache.org>
> wrote:
> > >
> > >> Hey Vito,
> > >>
> > >> Yes, thanks for the KIP. Sorry the discussion has been so long.
> > >> Hopefully, we can close it out soon.
> > >>
> > >> I agree we can drop StreamsNotRunningException in favor of
> > >> just StateStoreNotAvailableException.
> > >>
> > >> Unfortunately, I have some higher-level concerns. The value
> > >> of these exceptions is that they tell you how to handle the
> > >> various situations that can arise while querying a distributed
> > >> data store.
> > >>
> > >> Ideally, as a caller, I should be able to just catch "retriable" or
> > >> "fatal" and handle them appropriately. Otherwise, there's no
> > >> point in having categories, and we should just have all the
> > >> exceptions extend InvalidStateStoreException.
> > >>
> > >> Presently, it's not possible to tell from just the
> > >> "retriable"/"fatal" distinction what to do. You  can tell
> > >> from the descriptions of the various exceptions. E.g.:
> > >>
> > >> Retriable:
> > >>  * StreamsRebalancingException: the exact same call
> > >>     should just be retried until the rebalance is complete
> > >>  * StateStoreMigratedException: the store handle is
> > >>     now invalid, so you need to re-discover the instance
> > >>     and get a new handle on that instance. In other words,
> > >>     the query itself may be valid, but the particular method
> > >>     invocation on this particular instance has encountered
> > >>     a fatal exception.
> > >>
> > >> Fatal:
> > >>  * UnknownStateStoreException: this is truly fatal. No amount
> > >>     of retrying or re-discovering is going to get you a handle on a
> > >>     store that doesn't exist in the cluster.
> > >>  * StateStoreNotAvailableException: this is actually recoverable,
> > >>     since the store might exist in the cluster, but isn't available on
> > >>     this particular instance (which is shut down or whatever).
> > >>
> > >> Personally, I'm not a fan of code bureaucracy, so I'm 100% fine
> > >> with omitting the categorization and just having 5 subclasses
> > >> of InvalidStateStoreException. Each of them would tell you
> > >> how to handle them, and it's not too many to really
> > >> understand and handle each one.
> > >>
> > >> If you really want to have a middle tier, I'd recommend:
> > >> * RetryableStateStoreException: the exact same call
> > >>     should be repeated.
> > >> * RecoverableStateStoreException: the store handle
> > >>     should be discarded and the caller should re-discover
> > >>     the location of the store and repeat the query on the
> > >>     correct instance.
> > >> * FatalStateStoreException: the query/request is totally
> > >>     invalid and will never succeed.
> > >>
> > >> However, attempting to categorize the proposed exceptions
> > >> reveals even problems with this categorization:
> > >> Retriable:
> > >> * StreamsRebalancingException
> > >> Recoverable:
> > >> * StateStoreMigratedException
> > >> * StreamsNotRunningException
> > >> Fatal:
> > >> * UnknownStateStoreException
> > >>
> > >> But StreamsNotStartedException is strange... It means that
> > >> one code path got a handle on a specific KafkaStreams object
> > >> instance and sent it a query before another code path
> > >> invoked the start() method on the exact same object instance.
> > >> It seems like the most likely scenario is that whoever wrote
> > >> the program just forgot to call start() before querying, in
> > >> which case, retrying isn't going to help, and a fatal exception
> > >> is more appropriate. I.e., it sounds like a "first 15 minutes
> > >> experience" problem, and making it fatal would be more
> > >> helpful. Even in a production context, there's no reason not
> > >> to sequence your application startup such that you don't
> > >> accept queries until after Streams is started. Thus, I guess
> > >> I'd categorize it under "fatal".
> > >>
> > >> Regardless of whether you make it fatal or retriable, you'd
> > >> still have a whole category with only one exception in it,
> > >> and the other two categories only have two exceptions.
> > >> Plus, as you pointed out in the KIP, you can't get all
> > >> exceptions in all cases anyway:
> > >> * store() can only throw NotStarted, NotRunning,
> > >>     and Unknown
> > >> * actual store queries can only throw Rebalancing,
> > >>     Migrated, and NotRunning
> > >>
> > >> Thus, in practice also, there are exactly three categories
> > >> and also exactly three exception types. It doesn't seem
> > >> like there's a great advantage to the categories here. To
> > >> avoid the categorization problem and also to clarify what
> > >> exceptions can actually be thrown in different circumstances,
> > >> it seems like we should just:
> > >> * get rid of the middle tier and make all the exceptions
> > >>     extend InvalidStateStoreException
> > >> * drop StateStoreNotAvailableException in favor of
> > >>     StreamsNotRunningException
> > >> * clearly document on all public methods which exceptions
> > >>     need to be handled
> > >>
> > >> How do you feel about this?
> > >> Thanks,
> > >> -John
> > >>
> > >> On Wed, Jan 15, 2020, at 15:13, Bill Bejeck wrote:
> > >> > Thanks for KIP Vito.
> > >> >
> > >> > Overall the KIP LGTM, but I'd have to agree with others on merging
> the
> > >> > `StreamsNotRunningException` and `StateStoreNotAvailableException`
> > >> classes.
> > >> >
> > >> > Since in both cases, the thread state is in `PENDING_SHUTDOWN ||
> > >> > NOT_RUNNING || ERROR` I'm not even sure how we could distinguish
> when to
> > >> > use the different
> > >> > exceptions.  Maybe a good middle ground would be to have a detailed
> > >> > exception message.
> > >> >
> > >> > The KIP freeze is close, so I think if we can agree on this, we can
> > >> wrap up
> > >> > the voting soon.
> > >> >
> > >> > Thanks,
> > >> > Bill
> > >> >
> > >> > On Tue, Jan 14, 2020 at 2:12 PM Matthias J. Sax <
> matthias@confluent.io>
> > >> > wrote:
> > >> >
> > >> > > Vito,
> > >> > >
> > >> > > It's still unclear to me what the advantage is, to have both
> > >> > > `StreamsNotRunningException` and
> `StateStoreNotAvailableException`?
> > >> > >
> > >> > > For both cased, the state is `PENDING_SHUTDOWN / NOT_RUNNING
/
> ERROR`
> > >> > > and thus, for a user point of view, why does it matter if the
> store is
> > >> > > closed on not? I don't understand why/how this information would
> be
> > >> > > useful? Do you have a concrete example in mind how a user would
> react
> > >> > > differently to both exceptions?
> > >> > >
> > >> > >
> > >> > > @Vinoth: about `StreamsRebalancingException` -- to me, it seems
> best
> > >> to
> > >> > > actually do this on a per-query basis, ie, have an overload
> > >> > > `KafkaStreams#store(...)` that takes a boolean flag that allow
to
> > >> > > _disable_ the exception and opt-in to query a active store during
> > >> > > recovery. However, as KIP-535 actually introduces this change
in
> > >> > > behavior, I think KIP-216 should not cover this, but KIP-535
> should be
> > >> > > updated. I'll follow up on the other KIP thread to raise this
> point.
> > >> > >
> > >> > >
> > >> > > -Matthias
> > >> > >
> > >> > > On 1/11/20 12:26 AM, Vito Jeng wrote:
> > >> > > > Hi, Matthias & Vinoth,
> > >> > > >
> > >> > > > Thanks for the feedback.
> > >> > > >
> > >> > > >> What is still unclear to me is, what we gain by having
both
> > >> > > >> `StreamsNotRunningException` and
> > >> `StateStoreNotAvailableException`. Both
> > >> > > >> exception are thrown when KafkaStreams is in state
> > >> PENDING_SHUTDOWN /
> > >> > > >> NOT_RUNNING / ERROR. Hence, as a user what do I gain
to know
> if the
> > >> > > >> state store is closed on not -- I can't query it anyway?
Maybe
> I
> > >> miss
> > >> > > >> something thought?
> > >> > > >
> > >> > > > Yes, both `StreamsNotRunningException` and
> > >> > > > `StateStoreNotAvailableException` are fatal exception.
> > >> > > > But `StateStoreNotAvailableException` is fatal exception
about
> state
> > >> > > store
> > >> > > > related.
> > >> > > > I think it would be helpful that if user need to distinguish
> these
> > >> two
> > >> > > > different case to handle it.
> > >> > > >
> > >> > > > I'm not very sure, does that make sense?
> > >> > > >
> > >> > > >
> > >> > > > ---
> > >> > > > Vito
> > >> > > >
> > >> > > >
> > >> > > > On Fri, Jan 10, 2020 at 3:35 AM Vinoth Chandar <
> vinoth@apache.org>
> > >> > > wrote:
> > >> > > >
> > >> > > >> +1 on merging `StreamsNotRunningException` and
> > >> > > >> `StateStoreNotAvailableException`, both exceptions are
fatal
> > >> anyway. IMO
> > >> > > >> its best to have these exceptions be about the state
store
> (and not
> > >> > > streams
> > >> > > >> state), to easier understanding.
> > >> > > >>
> > >> > > >> Additionally, KIP-535 allows for querying of state stores
in
> > >> rebalancing
> > >> > > >> state. So do we need the StreamsRebalancingException?
> > >> > > >>
> > >> > > >>
> > >> > > >> On 2020/01/09 03:38:11, "Matthias J. Sax" <
> matthias@confluent.io>
> > >> > > wrote:
> > >> > > >>> Sorry that I dropped the ball on this...
> > >> > > >>>
> > >> > > >>> Thanks for updating the KIP. Overall LGTM now. Feel
free to
> start
> > >> a
> > >> > > VOTE
> > >> > > >>> thread.
> > >> > > >>>
> > >> > > >>> What is still unclear to me is, what we gain by
having both
> > >> > > >>> `StreamsNotRunningException` and
> > >> `StateStoreNotAvailableException`.
> > >> > > Both
> > >> > > >>> exception are thrown when KafkaStreams is in state
> > >> PENDING_SHUTDOWN /
> > >> > > >>> NOT_RUNNING / ERROR. Hence, as a user what do I
gain to know
> if
> > >> the
> > >> > > >>> state store is closed on not -- I can't query it
anyway?
> Maybe I
> > >> miss
> > >> > > >>> something thought?
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> -Matthias
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> On 11/3/19 6:07 PM, Vito Jeng wrote:
> > >> > > >>>> Sorry for the late reply, thanks for the review.
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>>> About `StateStoreMigratedException`:
> > >> > > >>>>>
> > >> > > >>>>> Why is it only thrown if the state is REBALANCING?
A store
> > >> might be
> > >> > > >>>>> migrated during a rebalance, and Kafka Streams
might resume
> > >> back to
> > >> > > >>>>> RUNNING state and afterward somebody tries
to use an old
> store
> > >> > > handle.
> > >> > > >>>>> Also, if state is REBALANCING, should we
throw
> > >> > > >>>>> `StreamThreadRebalancingException`? Hence,
I think
> > >> > > >>>>> `StateStoreMigratedException` does only
make sense during
> > >> `RUNNING`
> > >> > > >> state.
> > >> > > >>>>>
> > >> > > >>>>
> > >> > > >>>> Thank you point this, already updated.
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>> Why do we need to distinguish between
> > >> > > `KafkaStreamsNotRunningException`
> > >> > > >>>>> and `StateStoreNotAvailableException`?
> > >> > > >>>>>
> > >> > > >>>>
> > >> > > >>>> `KafkaStreamsNotRunningException` may be caused
by various
> > >> reasons, I
> > >> > > >> think
> > >> > > >>>> it would be helpful that the
> > >> > > >>>> user can distinguish whether it is caused by
the state store
> > >> closed.
> > >> > > >>>> (Maybe I am wrong...)
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>> Last, why do we distinguish between `KafkaStreams`
instance
> and
> > >> > > >>>>> `StreamsThread`? To me, it seems we should
always refer to
> the
> > >> > > >> instance,
> > >> > > >>>>> because that is the level of granularity
in which we
> > >> enable/disable
> > >> > > >> IQ atm.
> > >> > > >>>>>
> > >> > > >>>>
> > >> > > >>>> Totally agree. Do you mean the naming of state
store
> exceptions?
> > >> > > >>>> I don't have special reason to distinguish these
two.
> > >> > > >>>> Your suggestion look more reasonable for the
exception
> naming.
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>> Last, for `StateStoreMigratedException`, I would
add that a
> user
> > >> need
> > >> > > >> to
> > >> > > >>>>> rediscover the store and cannot blindly
retry as the store
> > >> handle is
> > >> > > >>>>> invalid and a new store handle must be retrieved.
That is a
> > >> > > difference
> > >> > > >>>>> to `StreamThreadRebalancingException` that
allows for
> "blind"
> > >> retries
> > >> > > >>>>> that either resolve (if the store is still
on the same
> instance
> > >> after
> > >> > > >>>>> rebalancing finishes, or changes to
> > >> `StateStoreMigratedException` if
> > >> > > >> the
> > >> > > >>>>> store was migrated away during rebalancing).
> > >> > > >>>>>
> > >> > > >>>>
> > >> > > >>>> Nice, it's great! Thank you.
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>> The KIP already updated, please take a look.
:)
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>> On Wed, Oct 23, 2019 at 1:48 PM Matthias J.
Sax <
> > >> > > matthias@confluent.io
> > >> > > >>>
> > >> > > >>>> wrote:
> > >> > > >>>>
> > >> > > >>>>> Any update on this KIP?
> > >> > > >>>>>
> > >> > > >>>>> On 10/7/19 3:35 PM, Matthias J. Sax wrote:
> > >> > > >>>>>> Sorry for the late reply. The 2.4 deadline
kept us quite
> busy.
> > >> > > >>>>>>
> > >> > > >>>>>> About `StateStoreMigratedException`:
> > >> > > >>>>>>
> > >> > > >>>>>> Why is it only thrown if the state is
REBALANCING? A store
> > >> might be
> > >> > > >>>>>> migrated during a rebalance, and Kafka
Streams might resume
> > >> back to
> > >> > > >>>>>> RUNNING state and afterward somebody
tries to use an old
> store
> > >> > > >> handle.
> > >> > > >>>>>> Also, if state is REBALANCING, should
we throw
> > >> > > >>>>>> `StreamThreadRebalancingException`?
Hence, I think
> > >> > > >>>>>> `StateStoreMigratedException` does only
make sense during
> > >> `RUNNING`
> > >> > > >>>>> state.
> > >> > > >>>>>>
> > >> > > >>>>>>
> > >> > > >>>>>> Why do we need to distinguish between
> > >> > > >> `KafkaStreamsNotRunningException`
> > >> > > >>>>>> and `StateStoreNotAvailableException`?
> > >> > > >>>>>>
> > >> > > >>>>>>
> > >> > > >>>>>> Last, why do we distinguish between
`KafkaStreams`
> instance and
> > >> > > >>>>>> `StreamsThread`? To me, it seems we
should always refer to
> the
> > >> > > >> instance,
> > >> > > >>>>>> because that is the level of granularity
in which we
> > >> enable/disable
> > >> > > >> IQ
> > >> > > >>>>> atm.
> > >> > > >>>>>>
> > >> > > >>>>>>
> > >> > > >>>>>> Last, for `StateStoreMigratedException`,
I would add that a
> > >> user
> > >> > > >> need to
> > >> > > >>>>>> rediscover the store and cannot blindly
retry as the store
> > >> handle is
> > >> > > >>>>>> invalid and a new store handle must
be retrieved. That is a
> > >> > > >> difference
> > >> > > >>>>>> to `StreamThreadRebalancingException`
that allows for
> "blind"
> > >> > > retries
> > >> > > >>>>>> that either resolve (if the store is
still on the same
> instance
> > >> > > after
> > >> > > >>>>>> rebalancing finishes, or changes to
> > >> `StateStoreMigratedException` if
> > >> > > >> the
> > >> > > >>>>>> store was migrated away during rebalancing).
> > >> > > >>>>>>
> > >> > > >>>>>>
> > >> > > >>>>>>
> > >> > > >>>>>> -Matthias
> > >> > > >>>>>>
> > >> > > >>>>>> On 8/9/19 10:20 AM, Vito Jeng wrote:
> > >> > > >>>>>>> My bad. The short link `https://shorturl.at/CDNT9`
> <https://shorturl.at/CDNT9>
> > >> <https://shorturl.at/CDNT9>
> > >> > > <https://shorturl.at/CDNT9>
> > >> > > >> <https://shorturl.at/CDNT9>
> > >> > > >>>>> <https://shorturl.at/CDNT9>
> > >> > > >>>>>>> <https://shorturl.at/CDNT9>
seems incorrect.
> > >> > > >>>>>>>
> > >> > > >>>>>>> Please use the following instead:
> https://shorturl.at/bkKQU
> > >> > > >>>>>>>
> > >> > > >>>>>>>
> > >> > > >>>>>>> ---
> > >> > > >>>>>>> Vito
> > >> > > >>>>>>>
> > >> > > >>>>>>>
> > >> > > >>>>>>> On Fri, Aug 9, 2019 at 10:53 AM
Vito Jeng <
> > >> vito@is-land.com.tw>
> > >> > > >> wrote:
> > >> > > >>>>>>>
> > >> > > >>>>>>>> Thanks, Matthias!
> > >> > > >>>>>>>>
> > >> > > >>>>>>>>> About `StreamThreadNotStartedException`:
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> Thank you for explanation. I
agree with your opinion.
> > >> > > >>>>>>>> `CompositeReadOnlyXxxStore#get()`
would never throw
> > >> > > >>>>>>>> `StreamThreadNotStartedException`.
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> For the case that corresponding
thread crashes after we
> > >> handed out
> > >> > > >> the
> > >> > > >>>>>>>> store handle. We may throw
> `KafkaStreamsNotRunningException`
> > >> or
> > >> > > >>>>>>>> `StateStoreMigratedException`.
> > >> > > >>>>>>>> In `StreamThreadStateStoreProvider`,
we would throw
> > >> > > >>>>>>>> `KafkaStreamsNotRunningException`
when stream thread is
> not
> > >> > > >> running(
> > >> > > >>>>>>>> https://shorturl.at/CDNT9) or
throw
> > >> `StateStoreMigratedException`
> > >> > > >> when
> > >> > > >>>>>>>> store is closed(https://shorturl.at/hrvAN).
So I think
> we
> > >> do not
> > >> > > >> need
> > >> > > >>>>> to
> > >> > > >>>>>>>> add a new type for this case.
Does that make sense?
> > >> > > >>>>>>>>
> > >> > > >>>>>>>>
> > >> > > >>>>>>>>> About `KafkaStreamsNotRunningException`
vs
> > >> > > >>>>>>>> `StreamThreadNotRunningException`:
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> I understand your point. I rename
> > >> > > >> `StreamThreadNotRunningException` to
> > >> > > >>>>>>>> `KafkaStreamsNotRunningException`.
> > >> > > >>>>>>>>
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> About check unknown state store
names:
> > >> > > >>>>>>>> Thank you for the hint. I add
a new type
> > >> > > >> `UnknownStateStoreException`
> > >> > > >>>>> for
> > >> > > >>>>>>>> this case.
> > >> > > >>>>>>>>
> > >> > > >>>>>>>>
> > >> > > >>>>>>>>> Also, we should still have
fatal exception
> > >> > > >>>>>>>> `StateStoreNotAvailableException`?
Not sure why you
> remove
> > >> it?
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> Thank you point this, already
add it again.
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> The KIP already updated, please
take a look.
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> ---
> > >> > > >>>>>>>> Vito
> > >> > > >>>>>>>>
> > >> > > >>>>>>>
> > >> > > >>>>>>
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>
> > >> > > >>>
> > >> > > >>>
> > >> > > >>
> > >> > > >
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message