kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bob Cotton (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-764) Race Condition in Broker Registration after ZooKeeper disconnect
Date Tue, 19 Feb 2013 21:07:14 GMT
Bob Cotton created KAFKA-764:

             Summary: Race Condition in Broker Registration after ZooKeeper disconnect
                 Key: KAFKA-764
                 URL: https://issues.apache.org/jira/browse/KAFKA-764
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 0.7.1
            Reporter: Bob Cotton

When running our ZooKeepers in VMware, occasionally all the keepers simultaneously pause long
enough for the Kafka clients to time out and then the keepers simultaneously un-pause.

When this happens, the zk clients disconnect from ZooKeeper. When ZooKeeper comes back ZkUtils.createEphemeralPathExpectConflict
discovers the node id of itself and does not re-register the broker id node and the function
call succeeds. Then ZooKeeper figures out the broker disconnected from the keeper and deletes
the ephemeral node *after* allowing the consumer to read the data in the /brokers/ids/x node.
 The broker then goes on to register all the topics, etc.  When consumers connect, they see
topic nodes associated with the broker but thy can't find the broker node to get connection
information for the broker, sending them into a rebalance loop until they reach rebalance.retries.max
and fail.

This might also be a ZooKeeper issue, but the desired behavior for a disconnect case might
be, if the broker node is found to explicitly delete and recreate it.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message