kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Roesler" <vvcep...@apache.org>
Subject Re: [DISCUSS]KIP-216: IQ should throw different exceptions for different errors
Date Wed, 15 Jan 2020 22:22:08 GMT
Hey Vito,

Yes, thanks for the KIP. Sorry the discussion has been so long. 
Hopefully, we can close it out soon.

I agree we can drop StreamsNotRunningException in favor of 
just StateStoreNotAvailableException.

Unfortunately, I have some higher-level concerns. The value 
of these exceptions is that they tell you how to handle the 
various situations that can arise while querying a distributed 
data store.

Ideally, as a caller, I should be able to just catch "retriable" or 
"fatal" and handle them appropriately. Otherwise, there's no 
point in having categories, and we should just have all the 
exceptions extend InvalidStateStoreException.

Presently, it's not possible to tell from just the 
"retriable"/"fatal" distinction what to do. You  can tell
from the descriptions of the various exceptions. E.g.:

Retriable:
 * StreamsRebalancingException: the exact same call 
    should just be retried until the rebalance is complete
 * StateStoreMigratedException: the store handle is
    now invalid, so you need to re-discover the instance 
    and get a new handle on that instance. In other words,
    the query itself may be valid, but the particular method 
    invocation on this particular instance has encountered 
    a fatal exception.

Fatal:
 * UnknownStateStoreException: this is truly fatal. No amount 
    of retrying or re-discovering is going to get you a handle on a 
    store that doesn't exist in the cluster.
 * StateStoreNotAvailableException: this is actually recoverable, 
    since the store might exist in the cluster, but isn't available on 
    this particular instance (which is shut down or whatever).

Personally, I'm not a fan of code bureaucracy, so I'm 100% fine 
with omitting the categorization and just having 5 subclasses 
of InvalidStateStoreException. Each of them would tell you 
how to handle them, and it's not too many to really 
understand and handle each one.

If you really want to have a middle tier, I'd recommend:
* RetryableStateStoreException: the exact same call 
    should be repeated.
* RecoverableStateStoreException: the store handle 
    should be discarded and the caller should re-discover 
    the location of the store and repeat the query on the 
    correct instance.
* FatalStateStoreException: the query/request is totally 
    invalid and will never succeed.

However, attempting to categorize the proposed exceptions 
reveals even problems with this categorization:
Retriable:
* StreamsRebalancingException
Recoverable:
* StateStoreMigratedException
* StreamsNotRunningException
Fatal:
* UnknownStateStoreException

But StreamsNotStartedException is strange... It means that 
one code path got a handle on a specific KafkaStreams object 
instance and sent it a query before another code path 
invoked the start() method on the exact same object instance. 
It seems like the most likely scenario is that whoever wrote 
the program just forgot to call start() before querying, in 
which case, retrying isn't going to help, and a fatal exception 
is more appropriate. I.e., it sounds like a "first 15 minutes 
experience" problem, and making it fatal would be more 
helpful. Even in a production context, there's no reason not 
to sequence your application startup such that you don't 
accept queries until after Streams is started. Thus, I guess 
I'd categorize it under "fatal".

Regardless of whether you make it fatal or retriable, you'd 
still have a whole category with only one exception in it, 
and the other two categories only have two exceptions. 
Plus, as you pointed out in the KIP, you can't get all 
exceptions in all cases anyway:
* store() can only throw NotStarted, NotRunning, 
    and Unknown
* actual store queries can only throw Rebalancing, 
    Migrated, and NotRunning

Thus, in practice also, there are exactly three categories 
and also exactly three exception types. It doesn't seem 
like there's a great advantage to the categories here. To 
avoid the categorization problem and also to clarify what 
exceptions can actually be thrown in different circumstances, 
it seems like we should just:
* get rid of the middle tier and make all the exceptions 
    extend InvalidStateStoreException
* drop StateStoreNotAvailableException in favor of 
    StreamsNotRunningException
* clearly document on all public methods which exceptions 
    need to be handled

How do you feel about this?
Thanks,
-John

On Wed, Jan 15, 2020, at 15:13, Bill Bejeck wrote:
> Thanks for KIP Vito.
> 
> Overall the KIP LGTM, but I'd have to agree with others on merging the
> `StreamsNotRunningException` and `StateStoreNotAvailableException` classes.
> 
> Since in both cases, the thread state is in `PENDING_SHUTDOWN ||
> NOT_RUNNING || ERROR` I'm not even sure how we could distinguish when to
> use the different
> exceptions.  Maybe a good middle ground would be to have a detailed
> exception message.
> 
> The KIP freeze is close, so I think if we can agree on this, we can wrap up
> the voting soon.
> 
> Thanks,
> Bill
> 
> On Tue, Jan 14, 2020 at 2:12 PM Matthias J. Sax <matthias@confluent.io>
> wrote:
> 
> > Vito,
> >
> > It's still unclear to me what the advantage is, to have both
> > `StreamsNotRunningException` and `StateStoreNotAvailableException`?
> >
> > For both cased, the state is `PENDING_SHUTDOWN / NOT_RUNNING / ERROR`
> > and thus, for a user point of view, why does it matter if the store is
> > closed on not? I don't understand why/how this information would be
> > useful? Do you have a concrete example in mind how a user would react
> > differently to both exceptions?
> >
> >
> > @Vinoth: about `StreamsRebalancingException` -- to me, it seems best to
> > actually do this on a per-query basis, ie, have an overload
> > `KafkaStreams#store(...)` that takes a boolean flag that allow to
> > _disable_ the exception and opt-in to query a active store during
> > recovery. However, as KIP-535 actually introduces this change in
> > behavior, I think KIP-216 should not cover this, but KIP-535 should be
> > updated. I'll follow up on the other KIP thread to raise this point.
> >
> >
> > -Matthias
> >
> > On 1/11/20 12:26 AM, Vito Jeng wrote:
> > > Hi, Matthias & Vinoth,
> > >
> > > Thanks for the feedback.
> > >
> > >> What is still unclear to me is, what we gain by having both
> > >> `StreamsNotRunningException` and `StateStoreNotAvailableException`. Both
> > >> exception are thrown when KafkaStreams is in state PENDING_SHUTDOWN /
> > >> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know if the
> > >> state store is closed on not -- I can't query it anyway? Maybe I miss
> > >> something thought?
> > >
> > > Yes, both `StreamsNotRunningException` and
> > > `StateStoreNotAvailableException` are fatal exception.
> > > But `StateStoreNotAvailableException` is fatal exception about state
> > store
> > > related.
> > > I think it would be helpful that if user need to distinguish these two
> > > different case to handle it.
> > >
> > > I'm not very sure, does that make sense?
> > >
> > >
> > > ---
> > > Vito
> > >
> > >
> > > On Fri, Jan 10, 2020 at 3:35 AM Vinoth Chandar <vinoth@apache.org>
> > wrote:
> > >
> > >> +1 on merging `StreamsNotRunningException` and
> > >> `StateStoreNotAvailableException`, both exceptions are fatal anyway. IMO
> > >> its best to have these exceptions be about the state store (and not
> > streams
> > >> state), to easier understanding.
> > >>
> > >> Additionally, KIP-535 allows for querying of state stores in rebalancing
> > >> state. So do we need the StreamsRebalancingException?
> > >>
> > >>
> > >> On 2020/01/09 03:38:11, "Matthias J. Sax" <matthias@confluent.io>
> > wrote:
> > >>> Sorry that I dropped the ball on this...
> > >>>
> > >>> Thanks for updating the KIP. Overall LGTM now. Feel free to start a
> > VOTE
> > >>> thread.
> > >>>
> > >>> What is still unclear to me is, what we gain by having both
> > >>> `StreamsNotRunningException` and `StateStoreNotAvailableException`.
> > Both
> > >>> exception are thrown when KafkaStreams is in state PENDING_SHUTDOWN
/
> > >>> NOT_RUNNING / ERROR. Hence, as a user what do I gain to know if the
> > >>> state store is closed on not -- I can't query it anyway? Maybe I miss
> > >>> something thought?
> > >>>
> > >>>
> > >>> -Matthias
> > >>>
> > >>>
> > >>> On 11/3/19 6:07 PM, Vito Jeng wrote:
> > >>>> Sorry for the late reply, thanks for the review.
> > >>>>
> > >>>>
> > >>>>> About `StateStoreMigratedException`:
> > >>>>>
> > >>>>> Why is it only thrown if the state is REBALANCING? A store
might be
> > >>>>> migrated during a rebalance, and Kafka Streams might resume
back to
> > >>>>> RUNNING state and afterward somebody tries to use an old store
> > handle.
> > >>>>> Also, if state is REBALANCING, should we throw
> > >>>>> `StreamThreadRebalancingException`? Hence, I think
> > >>>>> `StateStoreMigratedException` does only make sense during `RUNNING`
> > >> state.
> > >>>>>
> > >>>>
> > >>>> Thank you point this, already updated.
> > >>>>
> > >>>>
> > >>>> Why do we need to distinguish between
> > `KafkaStreamsNotRunningException`
> > >>>>> and `StateStoreNotAvailableException`?
> > >>>>>
> > >>>>
> > >>>> `KafkaStreamsNotRunningException` may be caused by various reasons,
I
> > >> think
> > >>>> it would be helpful that the
> > >>>> user can distinguish whether it is caused by the state store closed.
> > >>>> (Maybe I am wrong...)
> > >>>>
> > >>>>
> > >>>> Last, why do we distinguish between `KafkaStreams` instance and
> > >>>>> `StreamsThread`? To me, it seems we should always refer to
the
> > >> instance,
> > >>>>> because that is the level of granularity in which we enable/disable
> > >> IQ atm.
> > >>>>>
> > >>>>
> > >>>> Totally agree. Do you mean the naming of state store exceptions?
> > >>>> I don't have special reason to distinguish these two.
> > >>>> Your suggestion look more reasonable for the exception naming.
> > >>>>
> > >>>>
> > >>>> Last, for `StateStoreMigratedException`, I would add that a user
need
> > >> to
> > >>>>> rediscover the store and cannot blindly retry as the store
handle is
> > >>>>> invalid and a new store handle must be retrieved. That is a
> > difference
> > >>>>> to `StreamThreadRebalancingException` that allows for "blind"
retries
> > >>>>> that either resolve (if the store is still on the same instance
after
> > >>>>> rebalancing finishes, or changes to `StateStoreMigratedException`
if
> > >> the
> > >>>>> store was migrated away during rebalancing).
> > >>>>>
> > >>>>
> > >>>> Nice, it's great! Thank you.
> > >>>>
> > >>>>
> > >>>> The KIP already updated, please take a look. :)
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Wed, Oct 23, 2019 at 1:48 PM Matthias J. Sax <
> > matthias@confluent.io
> > >>>
> > >>>> wrote:
> > >>>>
> > >>>>> Any update on this KIP?
> > >>>>>
> > >>>>> On 10/7/19 3:35 PM, Matthias J. Sax wrote:
> > >>>>>> Sorry for the late reply. The 2.4 deadline kept us quite
busy.
> > >>>>>>
> > >>>>>> About `StateStoreMigratedException`:
> > >>>>>>
> > >>>>>> Why is it only thrown if the state is REBALANCING? A store
might be
> > >>>>>> migrated during a rebalance, and Kafka Streams might resume
back to
> > >>>>>> RUNNING state and afterward somebody tries to use an old
store
> > >> handle.
> > >>>>>> Also, if state is REBALANCING, should we throw
> > >>>>>> `StreamThreadRebalancingException`? Hence, I think
> > >>>>>> `StateStoreMigratedException` does only make sense during
`RUNNING`
> > >>>>> state.
> > >>>>>>
> > >>>>>>
> > >>>>>> Why do we need to distinguish between
> > >> `KafkaStreamsNotRunningException`
> > >>>>>> and `StateStoreNotAvailableException`?
> > >>>>>>
> > >>>>>>
> > >>>>>> Last, why do we distinguish between `KafkaStreams` instance
and
> > >>>>>> `StreamsThread`? To me, it seems we should always refer
to the
> > >> instance,
> > >>>>>> because that is the level of granularity in which we enable/disable
> > >> IQ
> > >>>>> atm.
> > >>>>>>
> > >>>>>>
> > >>>>>> Last, for `StateStoreMigratedException`, I would add that
a user
> > >> need to
> > >>>>>> rediscover the store and cannot blindly retry as the store
handle is
> > >>>>>> invalid and a new store handle must be retrieved. That
is a
> > >> difference
> > >>>>>> to `StreamThreadRebalancingException` that allows for "blind"
> > retries
> > >>>>>> that either resolve (if the store is still on the same
instance
> > after
> > >>>>>> rebalancing finishes, or changes to `StateStoreMigratedException`
if
> > >> the
> > >>>>>> store was migrated away during rebalancing).
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> -Matthias
> > >>>>>>
> > >>>>>> On 8/9/19 10:20 AM, Vito Jeng wrote:
> > >>>>>>> My bad. The short link `https://shorturl.at/CDNT9`
> > <https://shorturl.at/CDNT9>
> > >> <https://shorturl.at/CDNT9>
> > >>>>> <https://shorturl.at/CDNT9>
> > >>>>>>> <https://shorturl.at/CDNT9> seems incorrect.
> > >>>>>>>
> > >>>>>>> Please use the following instead: https://shorturl.at/bkKQU
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> ---
> > >>>>>>> Vito
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Fri, Aug 9, 2019 at 10:53 AM Vito Jeng <vito@is-land.com.tw>
> > >> wrote:
> > >>>>>>>
> > >>>>>>>> Thanks, Matthias!
> > >>>>>>>>
> > >>>>>>>>> About `StreamThreadNotStartedException`:
> > >>>>>>>>
> > >>>>>>>> Thank you for explanation. I agree with your opinion.
> > >>>>>>>> `CompositeReadOnlyXxxStore#get()` would never throw
> > >>>>>>>> `StreamThreadNotStartedException`.
> > >>>>>>>>
> > >>>>>>>> For the case that corresponding thread crashes
after we handed out
> > >> the
> > >>>>>>>> store handle. We may throw `KafkaStreamsNotRunningException`
or
> > >>>>>>>> `StateStoreMigratedException`.
> > >>>>>>>> In `StreamThreadStateStoreProvider`, we would throw
> > >>>>>>>> `KafkaStreamsNotRunningException` when stream thread
is not
> > >> running(
> > >>>>>>>> https://shorturl.at/CDNT9) or throw `StateStoreMigratedException`
> > >> when
> > >>>>>>>> store is closed(https://shorturl.at/hrvAN). So
I think we do not
> > >> need
> > >>>>> to
> > >>>>>>>> add a new type for this case. Does that make sense?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> About `KafkaStreamsNotRunningException` vs
> > >>>>>>>> `StreamThreadNotRunningException`:
> > >>>>>>>>
> > >>>>>>>> I understand your point. I rename
> > >> `StreamThreadNotRunningException` to
> > >>>>>>>> `KafkaStreamsNotRunningException`.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> About check unknown state store names:
> > >>>>>>>> Thank you for the hint. I add a new type
> > >> `UnknownStateStoreException`
> > >>>>> for
> > >>>>>>>> this case.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> Also, we should still have fatal exception
> > >>>>>>>> `StateStoreNotAvailableException`? Not sure why
you remove it?
> > >>>>>>>>
> > >>>>>>>> Thank you point this, already add it again.
> > >>>>>>>>
> > >>>>>>>> The KIP already updated, please take a look.
> > >>>>>>>>
> > >>>>>>>> ---
> > >>>>>>>> Vito
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> >
> >
>

Mime
View raw message