zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kfir Lev-Ari (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ZOOKEEPER-2684) Fix a crashing bug in the mixed workloads commit processor
Date Thu, 09 Feb 2017 08:41:41 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859210#comment-15859210
] 

Kfir Lev-Ari edited comment on ZOOKEEPER-2684 at 2/9/17 8:41 AM:
-----------------------------------------------------------------

[~nerdyyatrice], can you please describe the scenario in which the same request is processed
in the queue twice? 

As I see it, if a request r is received from a local client, then r is added to the queue
(note that r was already sent to the leader prior to that point).

Once a commit arrives from the leader, r is processed, and r won't be back to the queue, regardless
of a possible client disconnection (AFAIK, the connection is only needed at the end of the
line, when some kind of result is returned).

Now, lets say the client gets disconnected at some point in the time frame above while r is
processed, and connects to some server (same server or different). 

If a commit arrives to a different server, r will be processed as if it belongs to a remote
client, i.e., we will only perform the update, without using the connection. I'm not sure
that after disconnection ZK is required to inform the client's new session on his past actions..
(but I guess it can also be fixed if needed).
If a commit arrives and r is in the queue waiting for it, then it is processed as if it belongs
to a local connected client, but eventually the connection handle will show that that connection
ended, (if I remember the code correctly), so nothing to report, but ZK continue as usual.


Note that if a client writes something with lower cxid than r, the commit processor doesn't
track such a behavior, i.e., it is possible that the next head after r will have lower cxid
than r. We only care about the order of commits that we receive from the leader, and that
order can't be changed, because it is based on the network protocol order of messages (i.e.,
if r was already sent to the leader, than clearly r is committed prior to any new message
of the same client). 

Bottom line, it seems like r is processed only once per processor. What am I missing?


was (Author: kfirlevari):
[~nerdyyatrice], can you please describe the scenario in which the same request is processed
in the queue twice? 

As I see it, if a request r is received from a local client, then r is added to the queue
(note that r was already sent to the leader prior to that point).

Once a commit arrives from the leader, r is processed, and r won't be back to the queue, regardless
of a possible client disconnection (AFAIK, the connection is only needed at the end of the
line, when some kind of result is returned).

Now, lets say the client gets disconnected at some point in the time frame above while r is
processed, and connects to some server (same server or different). 

In the patch, if a commit arrives to a different server, r will be processed as if it belongs
to a remote client, i.e., we will only perform the update, without using the connection. I'm
not sure that after disconnection ZK is required to inform the client's new session on his
past actions.. (but I guess it can also be fixed if needed).
If a commit arrives and r is in the queue waiting for it, then it is processed as if it belongs
to a local connected client, but eventually the connection handle will show that that connection
ended, (if I remember the code correctly), so nothing to report, but ZK continue as usual.


Note that if a client writes something with lower cxid than r, the commit processor doesn't
track such a behavior, i.e., it is possible that the next head after r will have lower cxid
than r. We only care about the order of commits that we receive from the leader, and that
order can't be changed, because it is based on the network protocol order of messages (i.e.,
if r was already sent to the leader, than clearly r is committed prior to any new message
of the same client). 

Bottom line, it seems like r is processed only once per processor. What am I missing?

> Fix a crashing bug in the mixed workloads commit processor
> ----------------------------------------------------------
>
>                 Key: ZOOKEEPER-2684
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2684
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.6.0
>         Environment: with pretty heavy load on a real cluster
>            Reporter: Ryan Zhang
>            Assignee: Ryan Zhang
>            Priority: Blocker
>         Attachments: ZOOKEEPER-2684.patch
>
>
> We deployed our build with ZOOKEEPER-2024 and it quickly started to crash with the following
error
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:24:42,305 - ERROR [CommitProcessor:2] -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
– Got cxid 0x119fa expected 0x11fc5 for client session id 1009079ba470055
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:32:04,746 - ERROR [CommitProcessor:2] -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
– Got cxid 0x698 expected 0x928 for client session id 4002eeb3fd0009d
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:34:46,648 - ERROR [CommitProcessor:2] -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
– Got cxid 0x8904 expected 0x8f34 for client session id 51b8905c90251
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:43:46,834 - ERROR [CommitProcessor:2] -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
– Got cxid 0x3a8d expected 0x3ebc for client session id 2051af11af900cc
> clearly something is not right in the new commit processor per session queue implementation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message