zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kfir Lev-Ari (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2024) Major throughput improvement with mixed workloads
Date Sat, 10 Dec 2016 21:32:58 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15738520#comment-15738520
] 

Kfir Lev-Ari commented on ZOOKEEPER-2024:
-----------------------------------------

[~randgalt], I've made ZooNet public now, you can find it [here|https://github.com/kfirlevari/ZooNet].

I also wrote ZooNet a systest, as in [ZOOKEEPER-2023|https://issues.apache.org/jira/browse/ZOOKEEPER-2023],
one that supports multiple ZKs. For simplicity, I've also added a small script that runs the
ZooNet systest ([this|https://github.com/kfirlevari/ZooNet/blob/trunk/README.md] explains
how to run it).

Obviously, this implementation can be improved, e.g., better support for async ops (see [here|https://github.com/kfirlevari/ZooNet/issues/1]
for details). 
Let me know what you think ;)

Cheers,
Kfir

> Major throughput improvement with mixed workloads
> -------------------------------------------------
>
>                 Key: ZOOKEEPER-2024
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2024
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: quorum, server
>            Reporter: Kfir Lev-Ari
>            Assignee: Kfir Lev-Ari
>             Fix For: 3.6.0
>
>         Attachments: ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch,
ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch,
ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch
>
>
> The patch is applied to the commit processor, and solves two problems:
> 1. Stalling - once the commit processor encounters a local write request, it stalls local
processing of all sessions until it receives a commit of that request from the leader. 
> In mixed workloads, this severely hampers performance as it does not allow read-only
sessions to proceed at faster speed than read-write ones.
> 2. Starvation - as long as there are read requests to process, older remote committed
write requests are starved. 
> This occurs due to a bug fix (https://issues.apache.org/jira/browse/ZOOKEEPER-1505) that
forces processing of local read requests before handling any committed write. The problem
is only manifested under high local read load. 
> Our solution solves these two problems. It improves throughput in mixed workloads (in
our tests, by up to 8x), and reduces latency, especially higher percentiles (i.e., slowest
requests). 
> The main idea is to separate sessions that inherently need to stall in order to enforce
order semantics, from ones that do not need to stall. To this end, we add data structures
for buffering and managing pending requests of stalled sessions; these requests are moved
out of the critical path to these data structures, allowing continued processing of unaffected
sessions. 
> Please see the docs:  
> 1) https://goo.gl/m1cINJ - includes a detailed description of the new commit processor
algorithm.
> 2) The attached patch implements our solution, and a collection of related unit tests
(https://reviews.apache.org/r/25160)
> 3) https://goo.gl/W0xDUP - performance results. 
> (See https://issues.apache.org/jira/browse/ZOOKEEPER-2023 for the corresponding new system
test that produced these performance measurements)
>  
> See also https://issues.apache.org/jira/browse/ZOOKEEPER-1609



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message