kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin P. McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-5004) poll() timeout not enforced when connecting to 0.10.0 broker
Date Wed, 03 May 2017 17:23:04 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15995257#comment-15995257

Colin P. McCabe commented on KAFKA-5004:

Thanks for filing this, [~mjsax].  I think the severity is mitigated somewhat by the fact
that there has to be a client-side bug (polling thread dies) to trigger the bad behavior.

bq. IMHO, a "clean" solution would be, to disable the heartbeat thread if the client connects
to 0.10.0 broker and sends heartbeats on poll() as 0.10.0 consumer does. Not sure, how complex
this would be to do though.

I think this would be a bit risky since we'd be adding code that only ever gets used in a
very obscure error path when talking to 0.10.0 brokers.  It's not likely to be well-tested.

bq. [~cmccabe] had the idea to set a "flag" on the heartbeat thread each time poll() is called,
and let the heartbeat thread stop if max.poll.interval.ms passed and flag got not "renewed".

Yeah, this might be a good option.

> poll() timeout not enforced when connecting to 0.10.0 broker
> ------------------------------------------------------------
>                 Key: KAFKA-5004
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5004
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>    Affects Versions:
>            Reporter: Matthias J. Sax
> In 0.10.1, heartbeat thread and new poll timeout {{max.poll.interval.ms}} got introduced
via KIP-62. In 0.10.2, we added client-broker backward compatibility.
> Now, if a 0.10.2 client connects to a 0.10.0 broker, the broker only understand the heartbeat
timeout but not the poll timeout, while the client is still using the heartbeat background
threat. Thus, the new client config {{max.poll.interval.ms}} is ignored.
> In the worst case, the polling threat might die while the heartbeat thread is still up.
Thus, the broker would not timeout the client and no rebalance would be triggered while at
the same time the client is effectively dead not making any progress in its assigned partitions.

This message was sent by Atlassian JIRA

View raw message