cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-8035) 2.0.x repair causes large increasein client latency even for small datasets
Date Mon, 24 Aug 2015 15:57:46 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis resolved CASSANDRA-8035.
---------------------------------------
       Resolution: Cannot Reproduce
    Fix Version/s:     (was: 2.0.x)

Closing as cantrepro since 2.0 is EOL.  Please reopen if you see this on 2.1+

> 2.0.x repair causes large increasein client latency even for small datasets
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8035
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8035
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: c-2.0.10, 3 nodes per @ DCs.  Load < 50 MB
>            Reporter: Chris Burroughs
>         Attachments: cl-latency.png, cpu-idle.png, keyspace-99p.png, row-cache-hit-rate.png
>
>
> Running repair causes a significnat increase in client latency even when the total amount
of data per node is very small.
> Each node 900 req/s and during normal operations the 99p Client Request Lantecy is less
than 4 ms and usually less than 1ms.  During repair the latency increases to within 4-10ms
on all nodes.  I am unable to find any resource based explantion for this.  Several graphs
are attached to summarize.  Repair started at about 10:10 and finished around 10:25.
>  * Client Request Latency goes up significantly.
>  * Local keyspace read latency is flat.  I interpret this to mean that it's purly coordinator
overhead that's causing the slowdown.
>  * Row cache hit rate is unaffected ( and is very high).  Between these two metrics I
don't think there is any doubt that virtually all reads are being satisfied in memory.
>  * There is plenty of available cpu.  Aggregate cpu used (mostly nic) did go up during
this.
> Having more/larger keyspaces seems to make it worse.  Having two keyspaces on this cluster
(still with total size << RAM) caused larger increases in latency which would have made
for better graphs but it pushed the cluster well outsid of SLAs and we needed to move the
second keyspace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message