zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jie Huang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ZOOKEEPER-3243) Add server side request throttling
Date Fri, 11 Jan 2019 19:35:00 GMT
Jie Huang created ZOOKEEPER-3243:

             Summary: Add server side request throttling
                 Key: ZOOKEEPER-3243
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3243
             Project: ZooKeeper
          Issue Type: Improvement
          Components: server
            Reporter: Jie Huang
             Fix For: 3.6.0

On-going performance investigation at Facebook has demonstrated that Zookeeper is easily overwhelmed
by spikes in connection rates and/or write request rates. Zookeeper performance gets progressively
worse, clients timeout and try to reconnect (exacerbating the problem) and things enter a
death spiral. To solve this problem, we need to add load protection to Zookeeper via rate
limiting and work shedding.

This JIRA task adds a new request throttling mechanism (RequestThrottler) to Zookeeper in
hopes of preventing Zookeeper from becoming overwhelmed during request spikes.
When enabled, the RequestThrottler limits the number Of outstanding requests currently submitted
to the request processor pipeline. 
The throttler augments the limit imposed by the globalOutstandingLimit that is enforced by
the connection layer (NIOServerCnxn, NettyServerCnxn). The connection layer limit applies
backpressure against the TCP connection by disabling selection on connections once the request
limit is reached. However, the connection layer always allows a connection to send at least
one request before disabling selection on that connection. Thus, in a scenario with 40000
client connections, the total number of requests inflight may be as high as 40000 even if
the globalOustandingLimit was set lower.
The RequestThrottler addresses this issue by adding additional queueing. When enabled, client
connections no longer submit requests directly to the request processor pipeline but instead
to the RequestThrottler. The RequestThrottler is then responsible for issuing requests to
the request processors, and enforces a separate maxRequests limit. If the total number of
outstanding requests is higher than maxRequests, the throttler will continually stall for
stallTime milliseconds until under limit.
The RequestThrottler can also optionally drop stale requests rather than submit them to the
processor pipeline. A stale request is a request sent by a connection that is already closed,
and/or a request whose latency will end up being higher than its associated session timeout.
To ensure ordering guarantees, if a request is ever dropped from a connection that connection
is closed and flagged as invalid. All subsequent requests inflight from that connection are
then dropped as well.
The notion of staleness is configurable, both connection staleness and latency staleness can
be individually enabled/disabled. Both these settings and the various throttle settings (limit,
stall time, stale drop) can be configured via system properties as well as at runtime via
The throttler has been tested and benchmarked at Facebook

This message was sent by Atlassian JIRA

View raw message