cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Burroughs (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8035) 2.0.x repair causes large increasein client latency even for small datasets
Date Wed, 15 Oct 2014 12:39:33 GMT


Chris Burroughs commented on CASSANDRA-8035:

This particular cluster has triggered GCInspector only a handful of times in the past two
weeks, and none during the relevant repair period.  I think that makes GC an unlikely culprit.

> 2.0.x repair causes large increasein client latency even for small datasets
> ---------------------------------------------------------------------------
>                 Key: CASSANDRA-8035
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: c-2.0.10, 3 nodes per @ DCs.  Load < 50 MB
>            Reporter: Chris Burroughs
>         Attachments: cl-latency.png, cpu-idle.png, keyspace-99p.png, row-cache-hit-rate.png
> Running repair causes a significnat increase in client latency even when the total amount
of data per node is very small.
> Each node 900 req/s and during normal operations the 99p Client Request Lantecy is less
than 4 ms and usually less than 1ms.  During repair the latency increases to within 4-10ms
on all nodes.  I am unable to find any resource based explantion for this.  Several graphs
are attached to summarize.  Repair started at about 10:10 and finished around 10:25.
>  * Client Request Latency goes up significantly.
>  * Local keyspace read latency is flat.  I interpret this to mean that it's purly coordinator
overhead that's causing the slowdown.
>  * Row cache hit rate is unaffected ( and is very high).  Between these two metrics I
don't think there is any doubt that virtually all reads are being satisfied in memory.
>  * There is plenty of available cpu.  Aggregate cpu used (mostly nic) did go up during
> Having more/larger keyspaces seems to make it worse.  Having two keyspaces on this cluster
(still with total size << RAM) caused larger increases in latency which would have made
for better graphs but it pushed the cluster well outsid of SLAs and we needed to move the
second keyspace.

This message was sent by Atlassian JIRA

View raw message