zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-3072) Race condition in throttling
Date Thu, 19 Jul 2018 08:45:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549008#comment-16549008

Hadoop QA commented on ZOOKEEPER-3072:

+1 overall.  GitHub Pull Request  Build

    +1 @author.  The patch does not contain any @author tags.

    +0 tests included.  The patch appears to be a documentation patch that doesn't require

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1967//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1967//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1967//console

This message is automatically generated.

> Race condition in throttling
> ----------------------------
>                 Key: ZOOKEEPER-3072
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3072
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.5.4
>            Reporter: Botond Hejj
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
> There is a race condition in the server throttling code. It is possible that the disableRecv
is called after enableRecv.
> Basically, the I/O work thread does this in processPacket: [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L1102] 
>                 submitRequest(si);
>             }
>         }
>         cnxn.incrOutstandingRequests(h);
>     }
> incrOutstandingRequests() checks for limit breach, and potentially turns on throttling, [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L384]
> submitRequest() will create a logical request and en-queue it so that Processor thread
can pick it up. After being de-queued by Processor thread, it does necessary handling, and
then calls this [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java#L459] :
>             cnxn.sendResponse(hdr, rsp, "response");
> and in sendResponse(), it first appends to outgoing buffer, and then checks if un-throttle
is needed:  [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L708]
> However, if there is a context switch between submitRequest() and cnxn.incrOutstandingRequests(),
so that Processor thread completes cnxn.sendResponse() call before I/O thread switches back,
then enableRecv() will happen before disableRecv(), and enableRecv() will fail the CAS ops,
while disableRecv() will succeed, resulting in a deadlock: un-throttle is needed for letting
in requests, and sendResponse is needed to trigger un-throttle, but sendResponse() requires
an incoming message. From that point on, ZK server will no longer select the affected client
socket for read, leading to the observed client-side failure in the subject.
> If you would like to reproduce this than setting the globalOutstandingLimit down to 1
makes this reproducible easier as throttling starts with less requests. 

This message was sent by Atlassian JIRA

View raw message