kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-6593) Coordinator disconnect in heartbeat thread can cause commitSync to block indefinitely
Date Tue, 27 Feb 2018 18:14:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379070#comment-16379070

ASF GitHub Bot commented on KAFKA-6593:

hachikuji opened a new pull request #4625: KAFKA-6593 [WIP]; Fix livelock with consumer heartbeat
thread in commitSync
URL: https://github.com/apache/kafka/pull/4625
   Contention for the lock in ConsumerNetworkClient can lead to a livelock situation in which
an active commitSync is unable to make progress because its completion is blocked in the heartbeat
thread. The fix is twofold:
   1) We change ConsumerNetworkClient to use a fair lock to reduce the chance of each thread
getting starved.
   2) We eliminate the dependence on the lock in ConsumerNetworkClient for callback completion
so that callbacks will not be blocked by an active poll().
   I've left this as a WIP patch since I am still considering test cases.
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

> Coordinator disconnect in heartbeat thread can cause commitSync to block indefinitely
> -------------------------------------------------------------------------------------
>                 Key: KAFKA-6593
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6593
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 1.0.0,
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Major
>             Fix For: 1.1.0
>         Attachments: consumer.log
> If a coordinator disconnect is observed in the heartbeat thread, it can cause a pending
offset commit to be cancelled just before the foreground thread begins waiting on its response
in poll(). Since the poll timeout is Long.MAX_VALUE, this will cause the consumer to effectively
hang until some other network event causes the poll() to return. We try to protect this case
with a poll condition on the future, but this isn't bulletproof since the future can be completed
outside of the lock.

This message was sent by Atlassian JIRA

View raw message