kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (KAFKA-5786) Yet another exception is causing that streamming app is zombie
Date Wed, 30 Aug 2017 19:03:01 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147838#comment-16147838
] 

Matthias J. Sax edited comment on KAFKA-5786 at 8/30/17 7:02 PM:
-----------------------------------------------------------------

Thanks for the logs: If I read them correctly, some of your threads misses a rebalance due
to long state recreation in a previous rebalance. Thus, they drop out of the consumer group
without noticing in the first place. Thus, when the next rebalance happens, they try to commit
but fail, as they are not part of the group any longer. This issues should be fixed by KAFKA-5152
-- nevertheless, KAFKA-5152 only covers {{CommitFailedException}} as in your case and a proper
fix would be to not let the thread die in the first place on any exception. We do have a JIRA
for this already: KAFKA-5541

I am going to close this as a duplicate. In 0.11.0.1, the probability that you hit this issues
should be reduced (via KAFKA-5152), and I hope to get KAFKA-5541 into 1.0 that should deliver
the proper fix.

Thanks for reporting the issue! Btw: you can also follow KAFKA-5156 for further improvements
on internal exception handling.


was (Author: mjsax):
Thanks for the logs: If I read them correctly, some of your threads misses a rebalance due
to long state recreation in a previous rebalance. Thus, they drop out of the consumer group
without noticing in the first place. Thus, when the next rebalance happens, they try to commit
but fail, as they are not part of the group any longer. This issues should be mitigated by
KAFKA-5152 -- nevertheless, a proper fix would be to not let the thread die in the first place.
We do have a JIRA for this already: KAFKA-5541

I am going to close this as a duplicate. In 0.11.0.1, the probability that you hit this issues
should be reduced (via KAFKA-5152), and I hope to get KAFKA-5541 into 1.0 that should deliver
the proper fix.

Thanks for reporting the issue! Btw: you can also follow KAFKA-5156 for further improvements
on internal exception handling.

> Yet another exception is causing that streamming app is zombie
> --------------------------------------------------------------
>
>                 Key: KAFKA-5786
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5786
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Seweryn Habdank-Wojewodzki
>         Attachments: fatal-errors-by-rebalancing.zip
>
>
> Not handled exception in streamming app causes zombie state of the process.
> {code}
> 2017-08-24 15:17:40 WARN  StreamThread:978 - stream-thread [kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3]
Unexpected state transition from RUNNING to DEAD.
> 2017-08-24 15:17:40 FATAL StreamProcessor:67 - Caught unhandled exception: stream-thread
[kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3] Failed to rebalance.;
[org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:589),
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553),
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)] in thread
kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3
> {code}
> The final state of the app is similar to KAFKA-5779, but the exception and its location
is in different place.
> The exception shall be handled in the way that either application tries to continue working
or shall completely quit if the error is not recoverable.
> Current situation when application is zombie is not good.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message