zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2684) Fix a crashing bug in the mixed workloads commit processor
Date Mon, 30 Oct 2017 22:40:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225873#comment-16225873
] 

ASF GitHub Bot commented on ZOOKEEPER-2684:
-------------------------------------------

Github user shralex commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/411#discussion_r147854385
  
    --- Diff: src/java/test/org/apache/zookeeper/server/quorum/CommitProcessorConcurrencyTest.java
---
    @@ -376,4 +377,123 @@ public void noStarvationOfReadRequestsTest() throws Exception {
                         !processedRequests.contains(r));
             }
         }
    +
    +    /**
    +     * In the following test, we verify that we can handle the case that we got a commit
    +     * of a request we never seen since the session that we just established. This can
happen
    +     * when a session is just established and there is request waiting to be committed
in the
    +     * in the session queue but it sees a commit for a request that belongs to the previous
connection.
    +     */
    +    @Test(timeout = 1000)
    +    public void noCrashOnCommittedRequestsOfUnseenRequestTest() throws Exception {
    +        final String path = "/noCrash/OnCommittedRequests/OfUnseenRequestTest";
    +        final int numberofReads = 10;
    +        final int sessionid = 0x123456;
    +        final int firstCXid = 0x100;
    +        int readReqId = firstCXid;
    +        processor.stoppedMainLoop = true;
    +        HashSet<Request> localRequests = new HashSet<Request>();
    +        // queue the blocking write request to queuedRequests
    +        Request firstCommittedReq = newRequest(
    +                new CreateRequest(path, new byte[0], Ids.OPEN_ACL_UNSAFE,
    +                        CreateMode.PERSISTENT_SEQUENTIAL.toFlag()),
    +                OpCode.create, sessionid, readReqId++);
    +        processor.queuedRequests.add(firstCommittedReq);
    +        localRequests.add(firstCommittedReq);
    +
    +        // queue read requests to queuedRequests
    +        for (; readReqId <= numberofReads+firstCXid; ++readReqId) {
    +            Request readReq = newRequest(new GetDataRequest(path, false),
    +                    OpCode.getData, sessionid, readReqId);
    +            processor.queuedRequests.add(readReq);
    +            localRequests.add(readReq);
    +        }
    +
    +        //run once
    +        Assert.assertTrue(processor.queuedRequests.containsAll(localRequests));
    +        processor.initThreads(numberofReads* 2);
    +        processor.run();
    +
    +        //We verify that the processor is waiting for the commit
    +        Assert.assertTrue(processedRequests.isEmpty());
    +
    +        // We add a commit that belongs to the same session but with smaller cxid,
    +        // i.e., commit of an update from previous connection of this session.
    +        Request preSessionCommittedReq = newRequest(
    +                new CreateRequest(path, new byte[0], Ids.OPEN_ACL_UNSAFE,
    +                        CreateMode.PERSISTENT_SEQUENTIAL.toFlag()),
    +                OpCode.create, sessionid, firstCXid - 2);
    +        processor.committedRequests.add(preSessionCommittedReq);
    +        processor.committedRequests.add(firstCommittedReq);
    +        processor.run();
    +
    +        //We verify that the commit processor processed the old commit prior to the newer
messages
    +        Assert.assertTrue(processedRequests.peek() == preSessionCommittedReq);
    +
    +        processor.run();
    +
    +        //We verify that the commit processor handle all messages.
    +        Assert.assertTrue(processedRequests.containsAll(localRequests));
    +    }
    +
    +    /**
    +     * In the following test, we verify if we handle the case in which we get a commit
    +     * for a request that has higher Cxid than the one we are waiting. This can happen
    +     * when a session connection is lost but there is a request waiting to be committed
in the
    +     * session queue. However, since the session has moved, new requests can get to
    +     * the leader out of order. Hence, the commits can also arrive "out of order" w.r.t.
cxid.
    +     * We should commit the requests according to the order we receive from the leader,
i.e., wait for the relevant commit.
    --- End diff --
    
    1) I think its worth while explaining the two scenarios in the CommitProcessor.java comments.
I mean how its possible to get higher cxid and lower cxid from leader compared to what you
expect. Including the part where requests may reach the leader out of cxid order if session
is moved. 
    
    2) I wonder if in this case (receiving higher cxid from leader) we should also empty the
local pending requests ? What's the value of keeping the requests in the local queue ?


> Fix a crashing bug in the mixed workloads commit processor
> ----------------------------------------------------------
>
>                 Key: ZOOKEEPER-2684
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2684
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.6.0
>         Environment: with pretty heavy load on a real cluster
>            Reporter: Ryan Zhang
>            Assignee: Ryan Zhang
>            Priority: Blocker
>         Attachments: ZOOKEEPER-2684.patch
>
>
> We deployed our build with ZOOKEEPER-2024 and it quickly started to crash with the following
error
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:24:42,305 - ERROR [CommitProcessor:2] -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
– Got cxid 0x119fa expected 0x11fc5 for client session id 1009079ba470055
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:32:04,746 - ERROR [CommitProcessor:2] -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
– Got cxid 0x698 expected 0x928 for client session id 4002eeb3fd0009d
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:34:46,648 - ERROR [CommitProcessor:2] -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
– Got cxid 0x8904 expected 0x8f34 for client session id 51b8905c90251
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:43:46,834 - ERROR [CommitProcessor:2] -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
– Got cxid 0x3a8d expected 0x3ebc for client session id 2051af11af900cc
> clearly something is not right in the new commit processor per session queue implementation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message