zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andor Molnar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2985) Expired session may unexpired after leader failover
Date Tue, 06 Mar 2018 15:45:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387978#comment-16387978
] 

Andor Molnar commented on ZOOKEEPER-2985:
-----------------------------------------

[~cthunes]

Thanks for reporting this.

I think this is related to https://issues.apache.org/jira/browse/ZOOKEEPER-1208 which has
intentionally introduced the closing state for events which have been expired, but `closeSession`
has not been acknowledged by the quorum.

Will the ephemerals be removed eventually once the quorum established or they survive forever
because of the race condition?

> Expired session may unexpired after leader failover
> ---------------------------------------------------
>
>                 Key: ZOOKEEPER-2985
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2985
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.5.3, 3.4.11
>            Reporter: Chris Thunes
>            Priority: Major
>
> We recently observed an inconsistency in our Kafka cluster which we tracked down to ZooKeeper
sessions expiring and then re-appearing after a ZooKeeper leadership failover. The Kafka nodes
received session "Expired" events, leading to them starting new sessions and attempting to
re-create some ephemeral nodes (broker ID nodes in kafka/brokers/ids specifically). However,
between receiving the session Expired event and establishing a new session a leadership failover
occurred within the ZooKeeper cluster which resulted in the expired session re-appearing.
When Kafka attempted to re-create the ephemeral nodes mentioned above it (unexpectedly) received
NODEEXISTS errors.
> This behavior is a result of how session expiration is handled by the leader. Specifically,
the expired session is marked as "closing" immediately upon expiration (in SessionTrackerImpl)
and _before_ the corresponding "closeSession" entry is committed. A client can therefore
receive a session Expired event before its session is fully closed. A leadership failover
which results in the loss of the (uncommitted) closeSession entry thus leads to the sessions'
ephemeral nodes "re-appearing" until another expiration of the old session on the new leader
takes place.
> I'm not certain if this should be considered a bug or an edge case that client are expected
to handle. If it is the latter then I think it would be good to include this in the Programmer's
Guide in the documentation.
> If it's helpful I have code to reproduce this on an in-process cluster running 3.4.11
or 3.5.3-beta.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message