kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vito Jeng <v...@is-land.com.tw>
Subject Re: [DISCUSS]KIP-216: IQ should throw different exceptions for different errors
Date Thu, 16 Jan 2020 03:14:32 GMT
Hi John,

About `StreamsNotStartedException is strange` --
The original idea came from Matthias, two years ago. :)
You can reference here:
https://mail-archives.apache.org/mod_mbox/kafka-dev/201806.mbox/%3c6c32083e-b63c-435b-521d-032d45cc518f@confluent.io%3e

About omitting the categorization --
It looks reasonable. I'm fine with omitting the categorization but not very
sure it is a good choice.
Does any other folks provide opinion?


Hi, folks,

Just update the KIP-216, please take a look.

---
Vito


On Thu, Jan 16, 2020 at 6:35 AM Vito Jeng <vito@is-land.com.tw> wrote:

>
> Hi, folks,
>
> Thank you suggestion, really appreciate it. :)
> I understand your concern. I'll merge StreamsNotRunningException and
> StateStoreNotAvailableException.
>
>
> ---
> Vito
>
>
> On Thu, Jan 16, 2020 at 6:22 AM John Roesler <vvcephei@apache.org> wrote:
>
>> Hey Vito,
>>
>> Yes, thanks for the KIP. Sorry the discussion has been so long.
>> Hopefully, we can close it out soon.
>>
>> I agree we can drop StreamsNotRunningException in favor of
>> just StateStoreNotAvailableException.
>>
>> Unfortunately, I have some higher-level concerns. The value
>> of these exceptions is that they tell you how to handle the
>> various situations that can arise while querying a distributed
>> data store.
>>
>> Ideally, as a caller, I should be able to just catch "retriable" or
>> "fatal" and handle them appropriately. Otherwise, there's no
>> point in having categories, and we should just have all the
>> exceptions extend InvalidStateStoreException.
>>
>> Presently, it's not possible to tell from just the
>> "retriable"/"fatal" distinction what to do. You  can tell
>> from the descriptions of the various exceptions. E.g.:
>>
>> Retriable:
>>  * StreamsRebalancingException: the exact same call
>>     should just be retried until the rebalance is complete
>>  * StateStoreMigratedException: the store handle is
>>     now invalid, so you need to re-discover the instance
>>     and get a new handle on that instance. In other words,
>>     the query itself may be valid, but the particular method
>>     invocation on this particular instance has encountered
>>     a fatal exception.
>>
>> Fatal:
>>  * UnknownStateStoreException: this is truly fatal. No amount
>>     of retrying or re-discovering is going to get you a handle on a
>>     store that doesn't exist in the cluster.
>>  * StateStoreNotAvailableException: this is actually recoverable,
>>     since the store might exist in the cluster, but isn't available on
>>     this particular instance (which is shut down or whatever).
>>
>> Personally, I'm not a fan of code bureaucracy, so I'm 100% fine
>> with omitting the categorization and just having 5 subclasses
>> of InvalidStateStoreException. Each of them would tell you
>> how to handle them, and it's not too many to really
>> understand and handle each one.
>>
>> If you really want to have a middle tier, I'd recommend:
>> * RetryableStateStoreException: the exact same call
>>     should be repeated.
>> * RecoverableStateStoreException: the store handle
>>     should be discarded and the caller should re-discover
>>     the location of the store and repeat the query on the
>>     correct instance.
>> * FatalStateStoreException: the query/request is totally
>>     invalid and will never succeed.
>>
>> However, attempting to categorize the proposed exceptions
>> reveals even problems with this categorization:
>> Retriable:
>> * StreamsRebalancingException
>> Recoverable:
>> * StateStoreMigratedException
>> * StreamsNotRunningException
>> Fatal:
>> * UnknownStateStoreException
>>
>> But StreamsNotStartedException is strange... It means that
>> one code path got a handle on a specific KafkaStreams object
>> instance and sent it a query before another code path
>> invoked the start() method on the exact same object instance.
>> It seems like the most likely scenario is that whoever wrote
>> the program just forgot to call start() before querying, in
>> which case, retrying isn't going to help, and a fatal exception
>> is more appropriate. I.e., it sounds like a "first 15 minutes
>> experience" problem, and making it fatal would be more
>> helpful. Even in a production context, there's no reason not
>> to sequence your application startup such that you don't
>> accept queries until after Streams is started. Thus, I guess
>> I'd categorize it under "fatal".
>>
>> Regardless of whether you make it fatal or retriable, you'd
>> still have a whole category with only one exception in it,
>> and the other two categories only have two exceptions.
>> Plus, as you pointed out in the KIP, you can't get all
>> exceptions in all cases anyway:
>> * store() can only throw NotStarted, NotRunning,
>>     and Unknown
>> * actual store queries can only throw Rebalancing,
>>     Migrated, and NotRunning
>>
>> Thus, in practice also, there are exactly three categories
>> and also exactly three exception types. It doesn't seem
>> like there's a great advantage to the categories here. To
>> avoid the categorization problem and also to clarify what
>> exceptions can actually be thrown in different circumstances,
>> it seems like we should just:
>> * get rid of the middle tier and make all the exceptions
>>     extend InvalidStateStoreException
>> * drop StateStoreNotAvailableException in favor of
>>     StreamsNotRunningException
>> * clearly document on all public methods which exceptions
>>     need to be handled
>>
>> How do you feel about this?
>> Thanks,
>> -John
>>
>> On Wed, Jan 15, 2020, at 15:13, Bill Bejeck wrote:
>> > Thanks for KIP Vito.
>> >
>> > Overall the KIP LGTM, but I'd have to agree with others on merging the
>> > `StreamsNotRunningException` and `StateStoreNotAvailableException`
>> classes.
>> >
>> > Since in both cases, the thread state is in `PENDING_SHUTDOWN ||
>> > NOT_RUNNING || ERROR` I'm not even sure how we could distinguish when to
>> > use the different
>> > exceptions.  Maybe a good middle ground would be to have a detailed
>> > exception message.
>> >
>> > The KIP freeze is close, so I think if we can agree on this, we can
>> wrap up
>> > the voting soon.
>> >
>> > Thanks,
>> > Bill
>> >
>> > On Tue, Jan 14, 2020 at 2:12 PM Matthias J. Sax <matthias@confluent.io>
>> > wrote:
>> >
>> > > Vito,
>> > >
>> > > It's still unclear to me what the advantage is, to have both
>> > > `StreamsNotRunningException` and `StateStoreNotAvailableException`?
>> > >
>> > > For both cased, the state is `PENDING_SHUTDOWN / NOT_RUNNING / ERROR`
>> > > and thus, for a user point of view, why does it matter if the store is
>> > > closed on not? I don't understand why/how this information would be
>> > > useful? Do you have a concrete example in mind how a user would react
>> > > differently to both exceptions?
>> > >
>> > >
>> > > @Vinoth: about `StreamsRebalancingException` -- to me, it seems best
>> to
>> > > actually do this on a per-query basis, ie, have an overload
>> > > `KafkaStreams#store(...)` that takes a boolean flag that allow to
>> > > _disable_ the exception and opt-in to query a active store during
>> > > recovery. However, as KIP-535 actually introduces this change in
>> > > behavior, I think KIP-216 should not cover this, but KIP-535 should be
>> > > updated. I'll follow up on the other KIP thread to raise this point.
>> > >
>> > >
>> > > -Matthias
>> > >
>> > > On 1/11/20 12:26 AM, Vito Jeng wrote:
>> > > > Hi, Matthias & Vinoth,
>> > > >
>> > > > Thanks for the feedback.
>> > > >
>> > > >> What is still unclear to me is, what we gain by having both
>> > > >> `StreamsNotRunningException` and
>> `StateStoreNotAvailableException`. Both
>> > > >> exception are thrown when KafkaStreams is in state
>> PENDING_SHUTDOWN /
>> > > >> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know if
the
>> > > >> state store is closed on not -- I can't query it anyway? Maybe
I
>> miss
>> > > >> something thought?
>> > > >
>> > > > Yes, both `StreamsNotRunningException` and
>> > > > `StateStoreNotAvailableException` are fatal exception.
>> > > > But `StateStoreNotAvailableException` is fatal exception about state
>> > > store
>> > > > related.
>> > > > I think it would be helpful that if user need to distinguish these
>> two
>> > > > different case to handle it.
>> > > >
>> > > > I'm not very sure, does that make sense?
>> > > >
>> > > >
>> > > > ---
>> > > > Vito
>> > > >
>> > > >
>> > > > On Fri, Jan 10, 2020 at 3:35 AM Vinoth Chandar <vinoth@apache.org>
>> > > wrote:
>> > > >
>> > > >> +1 on merging `StreamsNotRunningException` and
>> > > >> `StateStoreNotAvailableException`, both exceptions are fatal
>> anyway. IMO
>> > > >> its best to have these exceptions be about the state store (and
not
>> > > streams
>> > > >> state), to easier understanding.
>> > > >>
>> > > >> Additionally, KIP-535 allows for querying of state stores in
>> rebalancing
>> > > >> state. So do we need the StreamsRebalancingException?
>> > > >>
>> > > >>
>> > > >> On 2020/01/09 03:38:11, "Matthias J. Sax" <matthias@confluent.io>
>> > > wrote:
>> > > >>> Sorry that I dropped the ball on this...
>> > > >>>
>> > > >>> Thanks for updating the KIP. Overall LGTM now. Feel free to
start
>> a
>> > > VOTE
>> > > >>> thread.
>> > > >>>
>> > > >>> What is still unclear to me is, what we gain by having both
>> > > >>> `StreamsNotRunningException` and
>> `StateStoreNotAvailableException`.
>> > > Both
>> > > >>> exception are thrown when KafkaStreams is in state
>> PENDING_SHUTDOWN /
>> > > >>> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know
if
>> the
>> > > >>> state store is closed on not -- I can't query it anyway? Maybe
I
>> miss
>> > > >>> something thought?
>> > > >>>
>> > > >>>
>> > > >>> -Matthias
>> > > >>>
>> > > >>>
>> > > >>> On 11/3/19 6:07 PM, Vito Jeng wrote:
>> > > >>>> Sorry for the late reply, thanks for the review.
>> > > >>>>
>> > > >>>>
>> > > >>>>> About `StateStoreMigratedException`:
>> > > >>>>>
>> > > >>>>> Why is it only thrown if the state is REBALANCING?
A store
>> might be
>> > > >>>>> migrated during a rebalance, and Kafka Streams might
resume
>> back to
>> > > >>>>> RUNNING state and afterward somebody tries to use
an old store
>> > > handle.
>> > > >>>>> Also, if state is REBALANCING, should we throw
>> > > >>>>> `StreamThreadRebalancingException`? Hence, I think
>> > > >>>>> `StateStoreMigratedException` does only make sense
during
>> `RUNNING`
>> > > >> state.
>> > > >>>>>
>> > > >>>>
>> > > >>>> Thank you point this, already updated.
>> > > >>>>
>> > > >>>>
>> > > >>>> Why do we need to distinguish between
>> > > `KafkaStreamsNotRunningException`
>> > > >>>>> and `StateStoreNotAvailableException`?
>> > > >>>>>
>> > > >>>>
>> > > >>>> `KafkaStreamsNotRunningException` may be caused by various
>> reasons, I
>> > > >> think
>> > > >>>> it would be helpful that the
>> > > >>>> user can distinguish whether it is caused by the state
store
>> closed.
>> > > >>>> (Maybe I am wrong...)
>> > > >>>>
>> > > >>>>
>> > > >>>> Last, why do we distinguish between `KafkaStreams` instance
and
>> > > >>>>> `StreamsThread`? To me, it seems we should always
refer to the
>> > > >> instance,
>> > > >>>>> because that is the level of granularity in which
we
>> enable/disable
>> > > >> IQ atm.
>> > > >>>>>
>> > > >>>>
>> > > >>>> Totally agree. Do you mean the naming of state store exceptions?
>> > > >>>> I don't have special reason to distinguish these two.
>> > > >>>> Your suggestion look more reasonable for the exception
naming.
>> > > >>>>
>> > > >>>>
>> > > >>>> Last, for `StateStoreMigratedException`, I would add that
a user
>> need
>> > > >> to
>> > > >>>>> rediscover the store and cannot blindly retry as the
store
>> handle is
>> > > >>>>> invalid and a new store handle must be retrieved.
That is a
>> > > difference
>> > > >>>>> to `StreamThreadRebalancingException` that allows
for "blind"
>> retries
>> > > >>>>> that either resolve (if the store is still on the
same instance
>> after
>> > > >>>>> rebalancing finishes, or changes to
>> `StateStoreMigratedException` if
>> > > >> the
>> > > >>>>> store was migrated away during rebalancing).
>> > > >>>>>
>> > > >>>>
>> > > >>>> Nice, it's great! Thank you.
>> > > >>>>
>> > > >>>>
>> > > >>>> The KIP already updated, please take a look. :)
>> > > >>>>
>> > > >>>>
>> > > >>>>
>> > > >>>> On Wed, Oct 23, 2019 at 1:48 PM Matthias J. Sax <
>> > > matthias@confluent.io
>> > > >>>
>> > > >>>> wrote:
>> > > >>>>
>> > > >>>>> Any update on this KIP?
>> > > >>>>>
>> > > >>>>> On 10/7/19 3:35 PM, Matthias J. Sax wrote:
>> > > >>>>>> Sorry for the late reply. The 2.4 deadline kept
us quite busy.
>> > > >>>>>>
>> > > >>>>>> About `StateStoreMigratedException`:
>> > > >>>>>>
>> > > >>>>>> Why is it only thrown if the state is REBALANCING?
A store
>> might be
>> > > >>>>>> migrated during a rebalance, and Kafka Streams
might resume
>> back to
>> > > >>>>>> RUNNING state and afterward somebody tries to
use an old store
>> > > >> handle.
>> > > >>>>>> Also, if state is REBALANCING, should we throw
>> > > >>>>>> `StreamThreadRebalancingException`? Hence, I think
>> > > >>>>>> `StateStoreMigratedException` does only make sense
during
>> `RUNNING`
>> > > >>>>> state.
>> > > >>>>>>
>> > > >>>>>>
>> > > >>>>>> Why do we need to distinguish between
>> > > >> `KafkaStreamsNotRunningException`
>> > > >>>>>> and `StateStoreNotAvailableException`?
>> > > >>>>>>
>> > > >>>>>>
>> > > >>>>>> Last, why do we distinguish between `KafkaStreams`
instance and
>> > > >>>>>> `StreamsThread`? To me, it seems we should always
refer to the
>> > > >> instance,
>> > > >>>>>> because that is the level of granularity in which
we
>> enable/disable
>> > > >> IQ
>> > > >>>>> atm.
>> > > >>>>>>
>> > > >>>>>>
>> > > >>>>>> Last, for `StateStoreMigratedException`, I would
add that a
>> user
>> > > >> need to
>> > > >>>>>> rediscover the store and cannot blindly retry
as the store
>> handle is
>> > > >>>>>> invalid and a new store handle must be retrieved.
That is a
>> > > >> difference
>> > > >>>>>> to `StreamThreadRebalancingException` that allows
for "blind"
>> > > retries
>> > > >>>>>> that either resolve (if the store is still on
the same instance
>> > > after
>> > > >>>>>> rebalancing finishes, or changes to
>> `StateStoreMigratedException` if
>> > > >> the
>> > > >>>>>> store was migrated away during rebalancing).
>> > > >>>>>>
>> > > >>>>>>
>> > > >>>>>>
>> > > >>>>>> -Matthias
>> > > >>>>>>
>> > > >>>>>> On 8/9/19 10:20 AM, Vito Jeng wrote:
>> > > >>>>>>> My bad. The short link `https://shorturl.at/CDNT9`
>> <https://shorturl.at/CDNT9>
>> > > <https://shorturl.at/CDNT9>
>> > > >> <https://shorturl.at/CDNT9>
>> > > >>>>> <https://shorturl.at/CDNT9>
>> > > >>>>>>> <https://shorturl.at/CDNT9> seems incorrect.
>> > > >>>>>>>
>> > > >>>>>>> Please use the following instead: https://shorturl.at/bkKQU
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>> ---
>> > > >>>>>>> Vito
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>> On Fri, Aug 9, 2019 at 10:53 AM Vito Jeng
<
>> vito@is-land.com.tw>
>> > > >> wrote:
>> > > >>>>>>>
>> > > >>>>>>>> Thanks, Matthias!
>> > > >>>>>>>>
>> > > >>>>>>>>> About `StreamThreadNotStartedException`:
>> > > >>>>>>>>
>> > > >>>>>>>> Thank you for explanation. I agree with
your opinion.
>> > > >>>>>>>> `CompositeReadOnlyXxxStore#get()` would
never throw
>> > > >>>>>>>> `StreamThreadNotStartedException`.
>> > > >>>>>>>>
>> > > >>>>>>>> For the case that corresponding thread
crashes after we
>> handed out
>> > > >> the
>> > > >>>>>>>> store handle. We may throw `KafkaStreamsNotRunningException`
>> or
>> > > >>>>>>>> `StateStoreMigratedException`.
>> > > >>>>>>>> In `StreamThreadStateStoreProvider`, we
would throw
>> > > >>>>>>>> `KafkaStreamsNotRunningException` when
stream thread is not
>> > > >> running(
>> > > >>>>>>>> https://shorturl.at/CDNT9) or throw
>> `StateStoreMigratedException`
>> > > >> when
>> > > >>>>>>>> store is closed(https://shorturl.at/hrvAN).
So I think we
>> do not
>> > > >> need
>> > > >>>>> to
>> > > >>>>>>>> add a new type for this case. Does that
make sense?
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>>> About `KafkaStreamsNotRunningException`
vs
>> > > >>>>>>>> `StreamThreadNotRunningException`:
>> > > >>>>>>>>
>> > > >>>>>>>> I understand your point. I rename
>> > > >> `StreamThreadNotRunningException` to
>> > > >>>>>>>> `KafkaStreamsNotRunningException`.
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>> About check unknown state store names:
>> > > >>>>>>>> Thank you for the hint. I add a new type
>> > > >> `UnknownStateStoreException`
>> > > >>>>> for
>> > > >>>>>>>> this case.
>> > > >>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>>> Also, we should still have fatal exception
>> > > >>>>>>>> `StateStoreNotAvailableException`? Not
sure why you remove
>> it?
>> > > >>>>>>>>
>> > > >>>>>>>> Thank you point this, already add it again.
>> > > >>>>>>>>
>> > > >>>>>>>> The KIP already updated, please take a
look.
>> > > >>>>>>>>
>> > > >>>>>>>> ---
>> > > >>>>>>>> Vito
>> > > >>>>>>>>
>> > > >>>>>>>
>> > > >>>>>>
>> > > >>>>>
>> > > >>>>>
>> > > >>>>
>> > > >>>
>> > > >>>
>> > > >>
>> > > >
>> > >
>> > >
>> >
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message