zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2684) Fix a crashing bug in the mixed workloads commit processor
Date Thu, 16 Feb 2017 06:53:42 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869349#comment-15869349
] 

Ryan Zhang commented on ZOOKEEPER-2684:
---------------------------------------

so here is what actually happened

On the server A, the request is processed right after the session connection is closed and
the request is forwarded to the lead

investigate.log:2017-02-14 23:30:50,373 - INFO [FollowerRequestProcessor:4] - Processing request::
sessionid:0x4006b1b17808b0c type:getData cxid:0x14 zxid:0xfffffffffffffffe txntype:unknown
reqpath:/loadgen/load_1775503086
investigate.log:2017-02-14 23:30:54,327 - INFO [FollowerRequestProcessor:4] - Processing request::
sessionid:0x4006b1b17808b0c type:getData cxid:0x15 zxid:0xfffffffffffffffe txntype:unknown
reqpath:/loadgen/load_1775503083
investigate.log:2017-02-14 23:30:57,888 - INFO [FollowerRequestProcessor:4] - Processing request::
sessionid:0x4006b1b17808b0c type:getData cxid:0x16 zxid:0xfffffffffffffffe txntype:unknown
reqpath:/loadgen/load_1775503084
investigate.log:2017-02-14 23:31:01,583 - INFO [FollowerRequestProcessor:4] - Processing request::
sessionid:0x4006b1b17808b0c type:getData cxid:0x17 zxid:0xfffffffffffffffe txntype:unknown
reqpath:/loadgen/load_1775493180
investigate.log:2017-02-14 23:31:04,867 - INFO [FollowerRequestProcessor:4] - Processing request::
sessionid:0x4006b1b17808b0c type:getData cxid:0x18 zxid:0xfffffffffffffffe txntype:unknown
reqpath:/loadgen/load_1775493181
investigate.log:2017-02-14 23:31:08,150 - INFO [FollowerRequestProcessor:4] - Processing request::
sessionid:0x4006b1b17808b0c type:getData cxid:0x19 zxid:0xfffffffffffffffe txntype:unknown
reqpath:/loadgen/load_1775985079
investigate.log:2017-02-14 23:31:12,672 - INFO [FollowerRequestProcessor:4] - Processing request::
sessionid:0x4006b1b17808b0c type:getChildren2 cxid:0x1a zxid:0xfffffffffffffffe txntype:unknown
reqpath:/loadgen
investigate.log:2017-02-14 23:31:17,293 - WARN [NIOWorkerThread-16] - Exception causing close
of session 0x4006b1b17808b0c: Broken pipe
investigate.log:2017-02-14 23:31:17,297 - INFO [FollowerRequestProcessor:4] - Processing request::
sessionid:0x4006b1b17808b0c type:delete cxid:0x1b zxid:0xfffffffffffffffe txntype:unknown
reqpath:/loadgen/load_1776069644
investigate.log:2017-02-14 23:31:21,816 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)]
- Committing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x27 zxid:0x385012a1e94
txntype:2 reqpath:n/a
investigate.log:2017-02-14 23:31:21,816 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)]
- Committing request:: sessionid:0x4006b1b17808b0c type:create cxid:0x28 zxid:0x385012a1e95
txntype:1 reqpath:n/a
investigate.log:2017-02-14 23:31:21,816 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)]
- Committing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x29 zxid:0x385012a1e96
txntype:2 reqpath:n/a
investigate.log:2017-02-14 23:31:21,816 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)]
- Committing request:: sessionid:0x4006b1b17808b0c type:create cxid:0x2a zxid:0x385012a1e98
txntype:1 reqpath:n/a
investigate.log:2017-02-14 23:31:21,817 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)]
- Committing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x2b zxid:0x385012a1e9b
txntype:2 reqpath:n/a
investigate.log:2017-02-14 23:31:21,817 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)]
- Committing request:: sessionid:0x4006b1b17808b0c type:create cxid:0x2c zxid:0x385012a1e9c
txntype:1 reqpath:n/a
investigate.log:2017-02-14 23:31:21,818 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)]
- Committing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x2d zxid:0x385012a1ea1
txntype:2 reqpath:n/a
investigate.log:2017-02-14 23:31:21,822 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)]
- Committing request:: sessionid:0x4006b1b17808b0c type:create cxid:0x2e zxid:0x385012a1eb8
txntype:1 reqpath:n/a
investigate.log:2017-02-14 23:31:21,824 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)]
- Committing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x2f zxid:0x385012a1ec0
txntype:2 reqpath:n/a
investigate.log:2017-02-14 23:31:21,833 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)]
- Committing request:: sessionid:0x4006b1b17808b0c type:create cxid:0x30 zxid:0x385012a1ee9
txntype:1 reqpath:n/a
investigate.log:2017-02-14 23:31:22,048 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)]
- Committing request:: sessionid:0x4006b1b17808b0c type:error cxid:0x1b zxid:0x385012a2156
txntype:-1 reqpath:n/a
investigate.log:2017-02-14 23:31:22,690 - ERROR [CommitProcessor:4] - Got request sessionid:0x4006b1b17808b0c
type:delete cxid:0x27 zxid:0x385012a1e94 txntype:2 reqpath:n/a
investigate.log: but we are expecting request sessionid:0x4006b1b17808b0c type:delete cxid:0x1b
zxid:0xfffffffffffffffe txntype:unknown reqpath:/loadgen/load_1776069644
investigate.log:2017-02-14 23:32:39,172 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)]
- Committing request:: sessionid:0x4006b1b17808b0c type:create cxid:0x4c zxid:0x385012b34fa
txntype:1 reqpath:n/a
investigate.log:2017-02-14 23:32:48,677 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)]
- Committing request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x4e zxid:0x385012b6c31
txntype:2 reqpath:n/a
investigate.log:2017-02-14 23:32:54,580 - INFO [QuorumPeer[myid=4](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled)]
- Committing request:: se


while on the lead side, the requests are arrived out of order (in terms of Cxid) 

2017-02-14 23:31:21,822 - INFO [LearnerHandler-/10.45.111.116:53431] - Committing request::
sessionid:0x4006b1b17808b0c type:create cxid:0x2e zxid:0x385012a1eb8 txntype:1 reqpath:n/a
2017-02-14 23:31:21,824 - INFO [LearnerHandler-/10.45.111.116:53431] - Commit a proposal:
sessionid:0x4006b1b17808b0c type:delete cxid:0x2f zxid:0x385012a1ec0 txntype:2 reqpath:n/a
2017-02-14 23:31:21,824 - INFO [LearnerHandler-/10.45.111.116:53431] - Committing request::
sessionid:0x4006b1b17808b0c type:delete cxid:0x2f zxid:0x385012a1ec0 txntype:2 reqpath:n/a
2017-02-14 23:31:21,833 - INFO [LearnerHandler-/10.45.111.116:53431] - Commit a proposal:
sessionid:0x4006b1b17808b0c type:create cxid:0x30 zxid:0x385012a1ee9 txntype:1 reqpath:n/a
2017-02-14 23:31:21,833 - INFO [LearnerHandler-/10.45.111.116:53431] - Committing request::
sessionid:0x4006b1b17808b0c type:create cxid:0x30 zxid:0x385012a1ee9 txntype:1 reqpath:n/a
2017-02-14 23:31:21,999 - DEBUG [ProcessThread(sid:1 cport:-1):] - Got user-level KeeperException
when processing sessionid:0x4006b1b17808b0c type:delete cxid:0x1b zxid:0x385012a2156 txntype:-1
reqpath:/loadgen/load_1776069644 Error Path:null Error:KeeperErrorCode = Session moved
investigate.log.1:2017-02-14 23:31:21,999 - INFO [ProcessThread(sid:1 cport:-1):] - Processing
request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x1b zxid:0x385012a2156 txntype:-1
reqpath:/loadgen/load_1776069644
investigate.log.1:2017-02-14 23:31:21,999 - INFO [ProcessThread(sid:1 cport:-1):] - Leader
proposing: sessionid:0x4006b1b17808b0c type:delete cxid:0x1b zxid:0x385012a2156 txntype:-1
reqpath:/loadgen/load_1776069644
investigate.log.1:2017-02-14 23:31:22,048 - INFO [LearnerHandler-/10.45.112.127:53292] - Commit
a proposal: sessionid:0x4006b1b17808b0c type:delete cxid:0x1b zxid:0x385012a2156 txntype:-1
reqpath:/loadgen/load_1776069644
investigate.log.1:2017-02-14 23:31:22,048 - INFO [LearnerHandler-/10.45.112.127:53292] - Committing
request:: sessionid:0x4006b1b17808b0c type:delete cxid:0x1b zxid:0x385012a2156 txntype:-1
reqpath:/loadgen/load_1776069644
2017-02-14 23:31:26,728 - INFO [ProcessThread(sid:1 cport:-1):] - Processing request:: sessionid:0x4006b1b17808b0c
type:delete cxid:0x32 zxid:0x385012a3bd0 txntype:2 reqpath:n/a
2017-02-14 23:31:26,728 - INFO [ProcessThread(sid:1 cport:-1):] - Leader proposing: sessionid:0x4006b1b17808b0c
type:delete cxid:0x32 zxid:0x385012a3bd0 txntype:2 reqpath:n/a
2017-02-14 23:31:26,728 - INFO [ProcessThread(sid:1 cport:-1):] - Processing request:: sessionid:0x4006b1b17808b0c
type:create cxid:0x33 zxid:0x385012a3bd1 txntype:1 reqpath:n/a

Ideally, I would like to check if the server still owns the session (has a connection open)
but it seems there is no easy way to check that. Any thoughts? The worst case is just be like
the un-patched version to not check at all...

> Fix a crashing bug in the mixed workloads commit processor
> ----------------------------------------------------------
>
>                 Key: ZOOKEEPER-2684
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2684
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.6.0
>         Environment: with pretty heavy load on a real cluster
>            Reporter: Ryan Zhang
>            Assignee: Ryan Zhang
>            Priority: Blocker
>         Attachments: ZOOKEEPER-2684.patch
>
>
> We deployed our build with ZOOKEEPER-2024 and it quickly started to crash with the following
error
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:24:42,305 - ERROR [CommitProcessor:2] -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
– Got cxid 0x119fa expected 0x11fc5 for client session id 1009079ba470055
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:32:04,746 - ERROR [CommitProcessor:2] -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
– Got cxid 0x698 expected 0x928 for client session id 4002eeb3fd0009d
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:34:46,648 - ERROR [CommitProcessor:2] -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
– Got cxid 0x8904 expected 0x8f34 for client session id 51b8905c90251
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:43:46,834 - ERROR [CommitProcessor:2] -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
– Got cxid 0x3a8d expected 0x3ebc for client session id 2051af11af900cc
> clearly something is not right in the new commit processor per session queue implementation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message