zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Heisey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2348) Data between leader and followers are not synchronized.
Date Tue, 30 Apr 2019 04:57:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829949#comment-16829949
] 

Shawn Heisey commented on ZOOKEEPER-2348:
-----------------------------------------

I will admit that trying to trace the description of what our user has said and the description
here is making my head hurt.  But it sounds to me like their situation and the one described
here are at least similar if not identical.  Solr is running the ZK 3.4.13 client.  The version
info from the user is "For context this a cluster running Solr 7.7.1 and ZooKeeper 3.4.13
(being monitored by Exhibitor 1.7.1)."  So I think they're running 3.4.13 on the server side
as well.

Here's the detailed scenario we got:

  *   We have three ZooKeeper nodes: A, B, and C. A is the leader of the ensemble.
  *   ZooKeeper A becomes partitioned from ZooKeeper B and C and the Solr tier.
  *   Some Solr nodes log “zkclient has disconnected” warnings and ZooKeeper A expires
some Solr client sessions due to timeouts. The partition between Zookeeper A and the Solr
tier ends and Solr nodes that were connected to ZooKeeper A attempt to renew their sessions
but are told their sessions have expired. [1]
     *   Note that I’m simplifying: some nodes that were connected to ZooKeeper A were able
to move their sessions to ZooKeeper B/C before their session expired. [2]
  *   ZooKeeper A realizes it is not synced with ZooKeeper B and C and closes connections
with Solr nodes and, apparently, remains partitioned from B/C.
  *   ZooKeeper B and C eventually elect ZooKeeper B as the leader and start accepting writes
requests as they form a quorum.
  *   Solr nodes previously connected to ZooKeeper that had their sessions expire now connect
to ZooKeeper B and C, they successfully publish their state as DOWN, and then attempt to write
to /live_nodes to signal that they’re reconnected to ZooKeeper.
  *   The writes of the ephemeral znodes to /live_nodes fail with NodeExists exceptions [3].
The failed writes are logged on ZooKeeper B. [4]
     *   It looks like a failure mode of “leader becomes partitioned and ephemeral znode
deletions are not processed by followers” is documented on ZOOKEEPER-2348<https://jira.apache.org/jira/browse/ZOOKEEPER-2348>.
  *   ZooKeeper A eventually rejoins the ensemble and the /live_nodes entries that expired
after the initial partition are removed when session expirations are reprocessed on the new
leader (ZooKeeper B) [5]
  *   The Solr nodes whose attempts at writing to /live_nodes failed never try again and remain
in the GONE state for 6+ hours.

I think there's probably some work we can do in Solr to improve how we manage the ephemeral
node creation so it's more robust.

> Data between leader and followers are not synchronized.
> -------------------------------------------------------
>
>                 Key: ZOOKEEPER-2348
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2348
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.5.1
>            Reporter: Echo Chen
>            Priority: Major
>
> When client session expired, leader tried to remove it from session map and remove its
EPHEMERAL znode, for example, /test_znode. This operation succeed on leader, but at the very
same time, network fault happended and not synced to followers, a new leader election launched.
After leader election finished, the new leader is not the old leader. we found the znode /test_znode
still existed in the followers but not on leader
>  *Scenario :* 
> 1) Create znode E.g.  
> {{/rmstore/ZKRMStateRoot/RMAppRoot/application_1449644945944_0001/appattempt_1449644945944_0001_000001}}
> 2) Delete Znode. 
> 3) Network fault b/w follower and leader machines
> 4) leader election again and follower became leader.
> Now data is not synced with new leader..After this client is not able to same znode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message