zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Beom Heyn Kim (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not properly truncate uncommitted write resulting in data inconsistency
Date Mon, 27 Nov 2017 19:23:00 GMT

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Beom Heyn Kim updated ZOOKEEPER-2945:
-------------------------------------
    Attachment: zookeeper-2945.patch

When the follower receives SNAP from the leader, it should also try to truncate transactions
later than the zxid of the Snapshot from the leader.

> Synchronization code on the follower does not properly truncate uncommitted write resulting
in data inconsistency
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2945
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2945
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.11, 3.4.12
>            Reporter: Beom Heyn Kim
>         Attachments: zookeeper-2945.patch
>
>
> Synchronization code in the syncWithLeader function of Learner.java doesn't seems to
truncate uncommitted writes properly when the follower receives SNAP msg from the leader.
This results in data inconsistency in the in-memory data tree across nodes. Here is one procedure
to reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832,
but it was for 3.4.10 and this one is for 3.4.11 and later)
> Initially:
> # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader)
> # Create 5 znodes with initial values as follow (key = value)
> {noformat}
> /testDivergenceResync0 = 0
> /testDivergenceResync1 = 1
> /testDivergenceResync2 = 2
> /testDivergenceResync3 = 3
> /testDivergenceResync4 = 4
> {noformat}
> To Reproduce:
> # Diverge the node 2
> a. Shutdown the node 0 and 1
> b. Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
> c. Shutdown the node 2
> # Restart the node 0 and 1 (let them finish with resync)
> # Diverge the node 1
> a. Shutdown the node 0
> b. Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
> c. Shutdown the node 1
> # Restart the node 0 and 1 (let them finish with resync)
> # Diverge the node 1
> a. Shutdown the node 0
> b. Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
> c. Shutdown the node 1
> # Restart the node 0 and 2 (let them finish with resync)
> # Diverge the node 0
> a. Shutdown the node 2
> b. Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
> c. Shutdown the node 0
> # Restart the node 1 and 2 (let them finish with resync)
> # Diverge the node 2
> a. Shutdown the node 1
> b. Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
> c. Shutdown the node 2
> # Restart the node 1 and 2 (let them finish with resync)
> # Restart the node 0 (let it finish with resync)
> Reading each key from each node directly will give us the output:
> {noformat}
> /testDivergenceResync0 on the node 0 = 0
> /testDivergenceResync0 on the node 1 = 0
> /testDivergenceResync0 on the node 2 = 0
> /testDivergenceResync1 on the node 0 = 1001
> /testDivergenceResync1 on the node 1 = 1001
> /testDivergenceResync1 on the node 2 = 1001
> /testDivergenceResync2 on the node 0 = 2
> /testDivergenceResync2 on the node 1 = 1002
> /testDivergenceResync2 on the node 2 = 2
> /testDivergenceResync3 on the node 0 = 3
> /testDivergenceResync3 on the node 1 = 3
> /testDivergenceResync3 on the node 2 = 3
> /testDivergenceResync4 on the node 0 = 1004
> /testDivergenceResync4 on the node 1 = 1004
> /testDivergenceResync4 on the node 2 = 1004
> {noformat}
> The value of key /testDivergenceResync2 is inconsistent across nodes -- node 1 has a
new value that will never be replicated to the other nodes.
> What seems to happen:
> # At the step 5, setData (at zxid 0x300000001) writing the value 1002 is committed on
the node 1.
> {panel:title=Log from the node 1:}
> ...
> 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174]
- Processing request:: sessionid:0x100011108080000 type:setData cxid:0x4 zxid:0x300000001
txntype:5 reqpath:n/a
> 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787]
- Proposing:: sessionid:0x100011108080000 type:setData cxid:0x4 zxid:0x300000001 txntype:5
reqpath:n/a
> 2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - Creating new
log file: log.300000001
> 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid:
0x300000001 is 1
> 2017-11-16 03:02:21,173 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512]
- Shutting down
> 2017-11-16 03:02:21,173 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518]
- Shutdown called
> java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced
with sids: [ 1 ]
> ...
> {panel}
> # At the step 8, the node 1 is restarted and supposed to be properly resync’ed with
the node 2 which is the leader.
>     a. The node 2 sends SNAP msg so that the node 1 can restore its in-memory data tree
from the snapshot of the in-memory data tree on the node 2. 
>     b. On the other hand, the node 1 will clear its in-memory data tree and restore it
with the snapshot from the node 2. Then, it takes its own snapshot at zxid 0x200000001. 
>     c. However, this does not remove the setData at zxid 0x300000001 from the transaction
log on the node 1.
> {panel:title=Log from the node 2:}
> ...
> 2017-11-16 03:02:37,470 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372]
- LEADING - LEADER ELECTION TOOK - 232
> 2017-11-16 03:02:37,479 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346]
- Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
> 2017-11-16 03:02:37,626 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401]
- Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x300000001
> 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472]
- proposals is empty
> 2017-11-16 03:02:37,626 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475]
- Sending SNAP
> 2017-11-16 03:02:37,626 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499]
- Sending snapshot last zxid of peer is 0x300000001  zxid of leader is 0x500000000sent zxid
of db as 0x200000001
> 2017-11-16 03:02:37,701 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@535]
- Received NEWLEADER-ACK message from 1
> 2017-11-16 03:02:37,702 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@962]
- Have quorum of supporters, sids: [ 1,2 ]; starting up and setting last processed zxid: 0x500000000
> ...
> {panel}
> {panel:title=Log from the node 1:}
> ...
> 2017-11-16 03:02:37,473 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Follower@65]
- FOLLOWING - LEADER ELECTION TOOK - 218
> 2017-11-16 03:02:37,475 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer$QuorumServer@184]
- Resolved hostname: 
> 127.0.0.1 to address: /127.0.0.1
> 2017-11-16 03:02:37,593 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxnFactory@215]
- Accepted socket connection from /127.0.0.1:57338
> 2017-11-16 03:02:37,626 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Learner@336]
- Getting a snapshot from leader 0x200000001
> 2017-11-16 03:02:37,627 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxn@383]
- Exception causing close of 
> session 0x0: ZooKeeperServer not running
> 2017-11-16 03:02:37,627 [myid:1] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxn@386]
- IOException stack trace
> java.io.IOException: ZooKeeperServer not running
>         at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:977)
>         at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:257)
>         at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:226)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-11-16 03:02:37,627 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxn@1040]
- Closed socket connection f
> or client /127.0.0.1:57338 (no session established for client)
> 2017-11-16 03:02:37,629 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:FileTxnSnapLog@248]
- Snapshotting: 0x200000001 to /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/1/version-2/snapshot.200000001
> ...
> {panel}
> # At the step 10, the node 1 is restarted again and supposed to be properly resync’ed
with the node 2 which is the leader again.
> a. When the node 1 is restarted, it restores its in-memory data tree from the snapshot
at zxid 0x200000001 and replay setData at zxid 0x300000001 (which actually needed to be truncated)
> b. However, the node 2 just sends DIFF containing setData written at 9th path, and no
truncation will be occurred.
> c. As a result, the node 1 still has the value 1002 while other nodes will have the value
2 for the same key
> {panel}
> {panel:title=Log from the node 2:}
> …
> 2017-11-16 03:03:21,033 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372]
- LEADING - LEADER ELECTION TOOK - 217
> 2017-11-16 03:03:21,038 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46967:LearnerHandler@346]
- Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@1e1cf18c
> 2017-11-16 03:03:21,103 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46967:LearnerHandler@401]
- Synchronizing with Follower sid: 1 maxCommittedLog=0x500000004 minCommittedLog=0x500000001
peerLastZxid=0x500000003
> 2017-11-16 03:03:21,103 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46967:LearnerHandler@415]
- proposal size is 4
> 2017-11-16 03:03:21,103 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46967:LearnerHandler@418]
- Sending proposals to follower
> 2017-11-16 03:03:21,103 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46967:LearnerHandler@475]
- Sending DIFF
> 2017-11-16 03:03:21,156 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxnFactory@215]
- Accepted socket connection from /127.0.0.1:49611
> 2017-11-16 03:03:21,178 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46967:LearnerHandler@535]
- Received NEWLEADER-ACK message from 1
> 2017-11-16 03:03:21,178 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@962]
- Have quorum of supporters, sids: [ 1,2 ]; starting up and setting last processed zxid: 0x600000000
> 2017-11-16 03:03:21,196 [myid:2] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxn@383]
- Exception causing close of session 0x0: ZooKeeperServer not running
> 2017-11-16 03:03:21,196 [myid:2] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxn@386]
- IOException stack trace
> java.io.IOException: ZooKeeperServer not running
>         at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:977)
>         at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:257)
>         at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:226)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-11-16 03:03:21,196 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxn@1040]
- Closed socket connection for client /127.0.0.1:49611 (no session established for client)
> 2017-11-16 03:03:21,237 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46967:Leader@579]
- outstanding is 0
> ...
> {panel}
> {panel}
> {panel:title=Log from the node 1:}
> …
> 2017-11-16 03:03:21,034 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Follower@65]
- FOLLOWING - LEADER ELECTION TOOK - 222
> 2017-11-16 03:03:21,035 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer$QuorumServer@184]
- Resolved hostname: 127.0.0.1 to address: /127.0.0.1
> 2017-11-16 03:03:21,104 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Learner@332]
- Getting a diff from the leader 0x500000004
> 2017-11-16 03:03:21,105 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Learner@387]
- Got zxid 0x500000004 expected 0x1
> 2017-11-16 03:03:21,189 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - Creating new
log file: log.500000004
> 2017-11-16 03:03:21,189 [myid:1] - DEBUG [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:CommitProcessor@164]
- Committing request:: sessionid:0x2000111082b0000 type:setData cxid:0x4 zxid:0x500000004
txntype:5 reqpath:n/a
> 2017-11-16 03:03:21,189 [myid:1] - DEBUG [CommitProcessor:1:FinalRequestProcessor@89]
- Processing request:: sessionid:0x2000111082b0000 type:setData cxid:0x4 zxid:0x500000004
txntype:5 reqpath:n/a
> ...
> {panel}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message