zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Beom Heyn Kim (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ZOOKEEPER-2945) Synchronization code on the follower does not truncate uncommitted write when it receives SNAP from the leader
Date Mon, 20 Nov 2017 08:27:00 GMT

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Beom Heyn Kim updated ZOOKEEPER-2945:
-------------------------------------
    Description: 
Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating
the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency
can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the
inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but
it was for 3.4.10 and this one is for 3.4.11 and later)

Initially:
# Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader)
# Create 5 znodes with initial values as follow (key = value)
## /testDivergenceResync0 = 0
## /testDivergenceResync1 = 1
## /testDivergenceResync2 = 2
## /testDivergenceResync3 = 3
## /testDivergenceResync4 = 4

To Reproduce:
Diverge the node 2
Shutdown the node 0 and 1
Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
Shutdown the node 2
Restart the node 0 and 1 (let them finish with resync)
Diverge the node 1
Shutdown the node 0
Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
Shutdown the node 1
Restart the node 0 and 1 (let them finish with resync)
Diverge the node 1
Shutdown the node 0
Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
Shutdown the node 1
Restart the node 0 and 2 (let them finish with resync)
Diverge the node 0
Shutdown the node 2
Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
Shutdown the node 0
Restart the node 1 and 2 (let them finish with resync)
Diverge the node 2
Shutdown the node 1
Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
Shutdown the node 2
Restart the node 1 and 2 (let them finish with resync)
Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
At the step 5, setData (at zxid 0x300000001) writing the value 1002 is committed on the node
1.
Log from the node 1: 

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174]
- Processing request:: sessionid:0x100011108080000 type:setData cxid:0x4 zxid:0x300000001
txntype:5 reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing::
sessionid:0x100011108080000 type:setData cxid:0x4 zxid:0x300000001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - Creating new log
file: log.300000001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x300000001
is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512]
- Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518]
- Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced
with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node
2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372]
- LEADING - LEADER ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346]
- Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401]
- Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x300000001
2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472]
- proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475]
- Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499]
- Sending snapshot last zxid of peer is 0x300000001  zxid of leader is 0x500000000sent zxid
of db as 0x200000001
2017-11-16 03:02:37,701 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@535]
- Received NEWLEADER-ACK message from 1
2017-11-16 03:02:37,702 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@962]
- Have quorum of supporters, sids: [ 1,2 ]; starting up and setting last processed zxid: 0x500000000
...
Log from the node 1:
...
2017-11-16 03:02:37,473 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Follower@65]
- FOLLOWING - LEADER ELECTION TOOK - 218
2017-11-16 03:02:37,475 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer$QuorumServer@184]
- Resolved hostname: 
127.0.0.1 to address: /127.0.0.1
2017-11-16 03:02:37,593 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxnFactory@215]
- Accepted socket connection from /127.0.0.1:57338
2017-11-16 03:02:37,626 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Learner@336]
- Getting a snapshot from leader 0x200000001
2017-11-16 03:02:37,627 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxn@383]
- Exception causing close of 
session 0x0: ZooKeeperServer not running
2017-11-16 03:02:37,627 [myid:1] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxn@386]
- IOException stack trace
java.io.IOException: ZooKeeperServer not running
        at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:977)
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:257)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:226)
        at java.lang.Thread.run(Thread.java:745)
2017-11-16 03:02:37,627 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxn@1040]
- Closed socket connection f
or client /127.0.0.1:57338 (no session established for client)
2017-11-16 03:02:37,629 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:FileTxnSnapLog@248]
- Snapshotting: 0x200000001 to /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/1/version-2/snapshot.200000001
...
Comment:
The node 2 sends SNAP msg so that the node 1 can restore its in-memory data tree from the
snapshot of the in-memory data tree on the node 2. 
On the other hand, the node 1 will clear its in-memory data tree and restore it with the snapshot
from the node 2. Then, it takes its own snapshot at zxid 0x200000001. 
However, this does not remove the setData at zxid 0x300000001 from the transaction log on
the node 1.
At the step 10, the node 1 is restarted again and supposed to be properly resync’ed with
the node 2 which is the leader again.
Log from the node 2:
…
2017-11-16 03:03:21,033 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372]
- LEADING - LEADER ELECTION TOOK - 217
2017-11-16 03:03:21,038 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46967:LearnerHandler@346]
- Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@1e1cf18c
2017-11-16 03:03:21,103 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46967:LearnerHandler@401]
- Synchronizing with Follower sid: 1 maxCommittedLog=0x500000004 minCommittedLog=0x500000001
peerLastZxid=0x500000003
2017-11-16 03:03:21,103 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46967:LearnerHandler@415]
- proposal size is 4
2017-11-16 03:03:21,103 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46967:LearnerHandler@418]
- Sending proposals to follower
2017-11-16 03:03:21,103 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46967:LearnerHandler@475]
- Sending DIFF
2017-11-16 03:03:21,156 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxnFactory@215]
- Accepted socket connection from /127.0.0.1:49611
2017-11-16 03:03:21,178 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46967:LearnerHandler@535]
- Received NEWLEADER-ACK message from 1
2017-11-16 03:03:21,178 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@962]
- Have quorum of supporters, sids: [ 1,2 ]; starting up and setting last processed zxid: 0x600000000
2017-11-16 03:03:21,196 [myid:2] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxn@383]
- Exception causing close of session 0x0: ZooKeeperServer not running
2017-11-16 03:03:21,196 [myid:2] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxn@386]
- IOException stack trace
java.io.IOException: ZooKeeperServer not running
        at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:977)
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:257)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:226)
        at java.lang.Thread.run(Thread.java:745)
2017-11-16 03:03:21,196 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxn@1040]
- Closed socket connection for client /127.0.0.1:49611 (no session established for client)
2017-11-16 03:03:21,237 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46967:Leader@579] - outstanding
is 0
...
Log from the node 1:
…
2017-11-16 03:03:21,034 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Follower@65]
- FOLLOWING - LEADER ELECTION TOOK - 222
2017-11-16 03:03:21,035 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer$QuorumServer@184]
- Resolved hostname: 127.0.0.1 to address: /127.0.0.1
2017-11-16 03:03:21,104 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Learner@332]
- Getting a diff from the leader 0x500000004
2017-11-16 03:03:21,105 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Learner@387]
- Got zxid 0x500000004 expected 0x1
2017-11-16 03:03:21,189 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - Creating new log
file: log.500000004
2017-11-16 03:03:21,189 [myid:1] - DEBUG [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:CommitProcessor@164]
- Committing request:: sessionid:0x2000111082b0000 type:setData cxid:0x4 zxid:0x500000004
txntype:5 reqpath:n/a
2017-11-16 03:03:21,189 [myid:1] - DEBUG [CommitProcessor:1:FinalRequestProcessor@89] - Processing
request:: sessionid:0x2000111082b0000 type:setData cxid:0x4 zxid:0x500000004 txntype:5 reqpath:n/a
...
Comment:
When the node 1 is restarted, it restores its in-memory data tree from the snapshot at zxid
0x200000001 and replay setData at zxid 0x300000001 (which actually needed to be truncated)
However, the node 2 just sends DIFF containing setData written at 9th path, and no truncation
will be occurred.
As a result, the node 1 still has the value 1002 while other nodes will have the value 2 for
the same key


  was:
Synchronization code in the syncWithLeader function of Learner.java seems not properly truncating
the uncommitted write when the follower receives SNAP msg from the leader. So, inconsistency
can occur in the in-memory data tree across nodes. Here is one procedure to reproduce the
inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832, but
it was for 3.4.10 and this one is for 3.4.11 and later)

Initially:
Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader)
Create 5 znodes with initial values as follow (key = value)
/testDivergenceResync0 = 0
/testDivergenceResync1 = 1
/testDivergenceResync2 = 2
/testDivergenceResync3 = 3
/testDivergenceResync4 = 4

To Reproduce:
Diverge the node 2
Shutdown the node 0 and 1
Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
Shutdown the node 2
Restart the node 0 and 1 (let them finish with resync)
Diverge the node 1
Shutdown the node 0
Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
Shutdown the node 1
Restart the node 0 and 1 (let them finish with resync)
Diverge the node 1
Shutdown the node 0
Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
Shutdown the node 1
Restart the node 0 and 2 (let them finish with resync)
Diverge the node 0
Shutdown the node 2
Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
Shutdown the node 0
Restart the node 1 and 2 (let them finish with resync)
Diverge the node 2
Shutdown the node 1
Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
Shutdown the node 2
Restart the node 1 and 2 (let them finish with resync)
Restart the node 0 (let it finish with resync)

Reading each key from each node directly will give us the output:
/testDivergenceResync0 on the node 0 = 0
/testDivergenceResync0 on the node 1 = 0
/testDivergenceResync0 on the node 2 = 0
/testDivergenceResync1 on the node 0 = 1001
/testDivergenceResync1 on the node 1 = 1001
/testDivergenceResync1 on the node 2 = 1001
/testDivergenceResync2 on the node 0 = 2
/testDivergenceResync2 on the node 1 = 1002
/testDivergenceResync2 on the node 2 = 2
/testDivergenceResync3 on the node 0 = 3
/testDivergenceResync3 on the node 1 = 3
/testDivergenceResync3 on the node 2 = 3
/testDivergenceResync4 on the node 0 = 1004
/testDivergenceResync4 on the node 1 = 1004
/testDivergenceResync4 on the node 2 = 1004
Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.

What seems to happen:
At the step 5, setData (at zxid 0x300000001) writing the value 1002 is committed on the node
1.
Log from the node 1: 

2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174]
- Processing request:: sessionid:0x100011108080000 type:setData cxid:0x4 zxid:0x300000001
txntype:5 reqpath:n/a
2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787] - Proposing::
sessionid:0x100011108080000 type:setData cxid:0x4 zxid:0x300000001 txntype:5 reqpath:n/a
2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - Creating new log
file: log.300000001
2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid: 0x300000001
is 1
2017-11-16 03:02:21,173 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512]
- Shutting down
2017-11-16 03:02:21,173 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518]
- Shutdown called
java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced
with sids: [ 1 ]
...
At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the node
2 which is the leader. 
Log from the node 2:
...
2017-11-16 03:02:37,470 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372]
- LEADING - LEADER ELECTION TOOK - 232
2017-11-16 03:02:37,479 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346]
- Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
2017-11-16 03:02:37,626 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401]
- Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x300000001
2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472]
- proposals is empty
2017-11-16 03:02:37,626 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475]
- Sending SNAP
2017-11-16 03:02:37,626 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499]
- Sending snapshot last zxid of peer is 0x300000001  zxid of leader is 0x500000000sent zxid
of db as 0x200000001
2017-11-16 03:02:37,701 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@535]
- Received NEWLEADER-ACK message from 1
2017-11-16 03:02:37,702 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@962]
- Have quorum of supporters, sids: [ 1,2 ]; starting up and setting last processed zxid: 0x500000000
...
Log from the node 1:
...
2017-11-16 03:02:37,473 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Follower@65]
- FOLLOWING - LEADER ELECTION TOOK - 218
2017-11-16 03:02:37,475 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer$QuorumServer@184]
- Resolved hostname: 
127.0.0.1 to address: /127.0.0.1
2017-11-16 03:02:37,593 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxnFactory@215]
- Accepted socket connection from /127.0.0.1:57338
2017-11-16 03:02:37,626 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Learner@336]
- Getting a snapshot from leader 0x200000001
2017-11-16 03:02:37,627 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxn@383]
- Exception causing close of 
session 0x0: ZooKeeperServer not running
2017-11-16 03:02:37,627 [myid:1] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxn@386]
- IOException stack trace
java.io.IOException: ZooKeeperServer not running
        at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:977)
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:257)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:226)
        at java.lang.Thread.run(Thread.java:745)
2017-11-16 03:02:37,627 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxn@1040]
- Closed socket connection f
or client /127.0.0.1:57338 (no session established for client)
2017-11-16 03:02:37,629 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:FileTxnSnapLog@248]
- Snapshotting: 0x200000001 to /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/1/version-2/snapshot.200000001
...
Comment:
The node 2 sends SNAP msg so that the node 1 can restore its in-memory data tree from the
snapshot of the in-memory data tree on the node 2. 
On the other hand, the node 1 will clear its in-memory data tree and restore it with the snapshot
from the node 2. Then, it takes its own snapshot at zxid 0x200000001. 
However, this does not remove the setData at zxid 0x300000001 from the transaction log on
the node 1.
At the step 10, the node 1 is restarted again and supposed to be properly resync’ed with
the node 2 which is the leader again.
Log from the node 2:
…
2017-11-16 03:03:21,033 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372]
- LEADING - LEADER ELECTION TOOK - 217
2017-11-16 03:03:21,038 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46967:LearnerHandler@346]
- Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@1e1cf18c
2017-11-16 03:03:21,103 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46967:LearnerHandler@401]
- Synchronizing with Follower sid: 1 maxCommittedLog=0x500000004 minCommittedLog=0x500000001
peerLastZxid=0x500000003
2017-11-16 03:03:21,103 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46967:LearnerHandler@415]
- proposal size is 4
2017-11-16 03:03:21,103 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46967:LearnerHandler@418]
- Sending proposals to follower
2017-11-16 03:03:21,103 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46967:LearnerHandler@475]
- Sending DIFF
2017-11-16 03:03:21,156 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxnFactory@215]
- Accepted socket connection from /127.0.0.1:49611
2017-11-16 03:03:21,178 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46967:LearnerHandler@535]
- Received NEWLEADER-ACK message from 1
2017-11-16 03:03:21,178 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@962]
- Have quorum of supporters, sids: [ 1,2 ]; starting up and setting last processed zxid: 0x600000000
2017-11-16 03:03:21,196 [myid:2] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxn@383]
- Exception causing close of session 0x0: ZooKeeperServer not running
2017-11-16 03:03:21,196 [myid:2] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxn@386]
- IOException stack trace
java.io.IOException: ZooKeeperServer not running
        at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:977)
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:257)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:226)
        at java.lang.Thread.run(Thread.java:745)
2017-11-16 03:03:21,196 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxn@1040]
- Closed socket connection for client /127.0.0.1:49611 (no session established for client)
2017-11-16 03:03:21,237 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46967:Leader@579] - outstanding
is 0
...
Log from the node 1:
…
2017-11-16 03:03:21,034 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Follower@65]
- FOLLOWING - LEADER ELECTION TOOK - 222
2017-11-16 03:03:21,035 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer$QuorumServer@184]
- Resolved hostname: 127.0.0.1 to address: /127.0.0.1
2017-11-16 03:03:21,104 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Learner@332]
- Getting a diff from the leader 0x500000004
2017-11-16 03:03:21,105 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Learner@387]
- Got zxid 0x500000004 expected 0x1
2017-11-16 03:03:21,189 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - Creating new log
file: log.500000004
2017-11-16 03:03:21,189 [myid:1] - DEBUG [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:CommitProcessor@164]
- Committing request:: sessionid:0x2000111082b0000 type:setData cxid:0x4 zxid:0x500000004
txntype:5 reqpath:n/a
2017-11-16 03:03:21,189 [myid:1] - DEBUG [CommitProcessor:1:FinalRequestProcessor@89] - Processing
request:: sessionid:0x2000111082b0000 type:setData cxid:0x4 zxid:0x500000004 txntype:5 reqpath:n/a
...
Comment:
When the node 1 is restarted, it restores its in-memory data tree from the snapshot at zxid
0x200000001 and replay setData at zxid 0x300000001 (which actually needed to be truncated)
However, the node 2 just sends DIFF containing setData written at 9th path, and no truncation
will be occurred.
As a result, the node 1 still has the value 1002 while other nodes will have the value 2 for
the same key



> Synchronization code on the follower does not truncate uncommitted write when it receives
SNAP from the leader
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2945
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2945
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.11, 3.4.12
>            Reporter: Beom Heyn Kim
>
> Synchronization code in the syncWithLeader function of Learner.java seems not properly
truncating the uncommitted write when the follower receives SNAP msg from the leader. So,
inconsistency can occur in the in-memory data tree across nodes. Here is one procedure to
reproduce the inconsistency. (Actually, this seems similar to my previous report on ZOOKEEPER-2832,
but it was for 3.4.10 and this one is for 3.4.11 and later)
> Initially:
> # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the leader)
> # Create 5 znodes with initial values as follow (key = value)
> ## /testDivergenceResync0 = 0
> ## /testDivergenceResync1 = 1
> ## /testDivergenceResync2 = 2
> ## /testDivergenceResync3 = 3
> ## /testDivergenceResync4 = 4
> To Reproduce:
> Diverge the node 2
> Shutdown the node 0 and 1
> Async setData to the node 2 writing 1000 to the key ‘/testDivergenceResync0’
> Shutdown the node 2
> Restart the node 0 and 1 (let them finish with resync)
> Diverge the node 1
> Shutdown the node 0
> Async setData to the node 1 writing 1001 to the key ‘/testDivergenceResync1’
> Shutdown the node 1
> Restart the node 0 and 1 (let them finish with resync)
> Diverge the node 1
> Shutdown the node 0
> Async setData to the node 1 writing 1002 to the key ‘/testDivergenceResync2’
> Shutdown the node 1
> Restart the node 0 and 2 (let them finish with resync)
> Diverge the node 0
> Shutdown the node 2
> Async setData to the node 0 writing 1003 to the key ‘/testDivergenceResync3’
> Shutdown the node 0
> Restart the node 1 and 2 (let them finish with resync)
> Diverge the node 2
> Shutdown the node 1
> Async setData to the node 2 writing 1004 to the key ‘/testDivergenceResync4’
> Shutdown the node 2
> Restart the node 1 and 2 (let them finish with resync)
> Restart the node 0 (let it finish with resync)
> Reading each key from each node directly will give us the output:
> /testDivergenceResync0 on the node 0 = 0
> /testDivergenceResync0 on the node 1 = 0
> /testDivergenceResync0 on the node 2 = 0
> /testDivergenceResync1 on the node 0 = 1001
> /testDivergenceResync1 on the node 1 = 1001
> /testDivergenceResync1 on the node 2 = 1001
> /testDivergenceResync2 on the node 0 = 2
> /testDivergenceResync2 on the node 1 = 1002
> /testDivergenceResync2 on the node 2 = 2
> /testDivergenceResync3 on the node 0 = 3
> /testDivergenceResync3 on the node 1 = 3
> /testDivergenceResync3 on the node 2 = 3
> /testDivergenceResync4 on the node 0 = 1004
> /testDivergenceResync4 on the node 1 = 1004
> /testDivergenceResync4 on the node 2 = 1004
> Thus, the value of key /testDivergenceResync2 is inconsistent across nodes.
> What seems to happen:
> At the step 5, setData (at zxid 0x300000001) writing the value 1002 is committed on the
node 1.
> Log from the node 1: 
> 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::CommitProcessor@174]
- Processing request:: sessionid:0x100011108080000 type:setData cxid:0x4 zxid:0x300000001
txntype:5 reqpath:n/a
> 2017-11-16 03:02:19,964 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::Leader@787]
- Proposing:: sessionid:0x100011108080000 type:setData cxid:0x4 zxid:0x300000001 txntype:5
reqpath:n/a
> 2017-11-16 03:02:19,965 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - Creating new
log file: log.300000001
> 2017-11-16 03:02:20,016 [myid:1] - DEBUG [SyncThread:1:Leader@600] - Count for zxid:
0x300000001 is 1
> 2017-11-16 03:02:21,173 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@512]
- Shutting down
> 2017-11-16 03:02:21,173 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@518]
- Shutdown called
> java.lang.Exception: shutdown Leader! reason: Not sufficient followers synced, only synced
with sids: [ 1 ]
> ...
> At the step 8, the node 1 is restarted and supposed to be properly resync’ed with the
node 2 which is the leader. 
> Log from the node 2:
> ...
> 2017-11-16 03:02:37,470 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372]
- LEADING - LEADER ELECTION TOOK - 232
> 2017-11-16 03:02:37,479 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@346]
- Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@46cc2846
> 2017-11-16 03:02:37,626 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@401]
- Synchronizing with Follower sid: 1 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x300000001
> 2017-11-16 03:02:37,626 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46899:LearnerHandler@472]
- proposals is empty
> 2017-11-16 03:02:37,626 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@475]
- Sending SNAP
> 2017-11-16 03:02:37,626 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@499]
- Sending snapshot last zxid of peer is 0x300000001  zxid of leader is 0x500000000sent zxid
of db as 0x200000001
> 2017-11-16 03:02:37,701 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46899:LearnerHandler@535]
- Received NEWLEADER-ACK message from 1
> 2017-11-16 03:02:37,702 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@962]
- Have quorum of supporters, sids: [ 1,2 ]; starting up and setting last processed zxid: 0x500000000
> ...
> Log from the node 1:
> ...
> 2017-11-16 03:02:37,473 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Follower@65]
- FOLLOWING - LEADER ELECTION TOOK - 218
> 2017-11-16 03:02:37,475 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer$QuorumServer@184]
- Resolved hostname: 
> 127.0.0.1 to address: /127.0.0.1
> 2017-11-16 03:02:37,593 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxnFactory@215]
- Accepted socket connection from /127.0.0.1:57338
> 2017-11-16 03:02:37,626 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Learner@336]
- Getting a snapshot from leader 0x200000001
> 2017-11-16 03:02:37,627 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxn@383]
- Exception causing close of 
> session 0x0: ZooKeeperServer not running
> 2017-11-16 03:02:37,627 [myid:1] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxn@386]
- IOException stack trace
> java.io.IOException: ZooKeeperServer not running
>         at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:977)
>         at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:257)
>         at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:226)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-11-16 03:02:37,627 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11224:NIOServerCnxn@1040]
- Closed socket connection f
> or client /127.0.0.1:57338 (no session established for client)
> 2017-11-16 03:02:37,629 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:FileTxnSnapLog@248]
- Snapshotting: 0x200000001 to /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/1/version-2/snapshot.200000001
> ...
> Comment:
> The node 2 sends SNAP msg so that the node 1 can restore its in-memory data tree from
the snapshot of the in-memory data tree on the node 2. 
> On the other hand, the node 1 will clear its in-memory data tree and restore it with
the snapshot from the node 2. Then, it takes its own snapshot at zxid 0x200000001. 
> However, this does not remove the setData at zxid 0x300000001 from the transaction log
on the node 1.
> At the step 10, the node 1 is restarted again and supposed to be properly resync’ed
with the node 2 which is the leader again.
> Log from the node 2:
> …
> 2017-11-16 03:03:21,033 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@372]
- LEADING - LEADER ELECTION TOOK - 217
> 2017-11-16 03:03:21,038 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46967:LearnerHandler@346]
- Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@1e1cf18c
> 2017-11-16 03:03:21,103 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46967:LearnerHandler@401]
- Synchronizing with Follower sid: 1 maxCommittedLog=0x500000004 minCommittedLog=0x500000001
peerLastZxid=0x500000003
> 2017-11-16 03:03:21,103 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46967:LearnerHandler@415]
- proposal size is 4
> 2017-11-16 03:03:21,103 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46967:LearnerHandler@418]
- Sending proposals to follower
> 2017-11-16 03:03:21,103 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46967:LearnerHandler@475]
- Sending DIFF
> 2017-11-16 03:03:21,156 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxnFactory@215]
- Accepted socket connection from /127.0.0.1:49611
> 2017-11-16 03:03:21,178 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:46967:LearnerHandler@535]
- Received NEWLEADER-ACK message from 1
> 2017-11-16 03:03:21,178 [myid:2] - INFO  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@962]
- Have quorum of supporters, sids: [ 1,2 ]; starting up and setting last processed zxid: 0x600000000
> 2017-11-16 03:03:21,196 [myid:2] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxn@383]
- Exception causing close of session 0x0: ZooKeeperServer not running
> 2017-11-16 03:03:21,196 [myid:2] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxn@386]
- IOException stack trace
> java.io.IOException: ZooKeeperServer not running
>         at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:977)
>         at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:257)
>         at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:226)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-11-16 03:03:21,196 [myid:2] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11227:NIOServerCnxn@1040]
- Closed socket connection for client /127.0.0.1:49611 (no session established for client)
> 2017-11-16 03:03:21,237 [myid:2] - DEBUG [LearnerHandler-/127.0.0.1:46967:Leader@579]
- outstanding is 0
> ...
> Log from the node 1:
> …
> 2017-11-16 03:03:21,034 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Follower@65]
- FOLLOWING - LEADER ELECTION TOOK - 222
> 2017-11-16 03:03:21,035 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer$QuorumServer@184]
- Resolved hostname: 127.0.0.1 to address: /127.0.0.1
> 2017-11-16 03:03:21,104 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Learner@332]
- Getting a diff from the leader 0x500000004
> 2017-11-16 03:03:21,105 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Learner@387]
- Got zxid 0x500000004 expected 0x1
> 2017-11-16 03:03:21,189 [myid:1] - INFO  [SyncThread:1:FileTxnLog@209] - Creating new
log file: log.500000004
> 2017-11-16 03:03:21,189 [myid:1] - DEBUG [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:CommitProcessor@164]
- Committing request:: sessionid:0x2000111082b0000 type:setData cxid:0x4 zxid:0x500000004
txntype:5 reqpath:n/a
> 2017-11-16 03:03:21,189 [myid:1] - DEBUG [CommitProcessor:1:FinalRequestProcessor@89]
- Processing request:: sessionid:0x2000111082b0000 type:setData cxid:0x4 zxid:0x500000004
txntype:5 reqpath:n/a
> ...
> Comment:
> When the node 1 is restarted, it restores its in-memory data tree from the snapshot at
zxid 0x200000001 and replay setData at zxid 0x300000001 (which actually needed to be truncated)
> However, the node 2 just sends DIFF containing setData written at 9th path, and no truncation
will be occurred.
> As a result, the node 1 still has the value 1002 while other nodes will have the value
2 for the same key



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message