cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Heffner (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-7522) Bootstrapping a single node spikes cluster-wide p95 latencies
Date Wed, 09 Jul 2014 02:29:05 GMT
Mike Heffner created CASSANDRA-7522:
---------------------------------------

             Summary: Bootstrapping a single node spikes cluster-wide p95 latencies
                 Key: CASSANDRA-7522
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7522
             Project: Cassandra
          Issue Type: Bug
          Components: Core
         Environment: AWS, i2.2xlarge HVM instances
            Reporter: Mike Heffner


We've recently run some tests with Cassandra 2.0.9, largely because we are interested in the
streaming improvements in the 2.0.x series, see: CASSANDRA-5726. However, our results so far
show that even with 2.0.x, streaming impacts are still quite large and hard to control for.

Our test environment was a 9 node, 2.0.9 ring running on AWS on i2.2xlarge HVM instances using
Oracle JVM 1.7.0.55. Each node is set to use vnodes with 256 tokens each. We tested expanding
this ring to a 12 node ring. We bootstrapped each node with different throttle settings set
around the ring:

1st node:
* no throttle, stream/compaction throughput = 0

2nd node:
* stream throughput = 200
* compaction throughput = 256

3rd node:
* stream throughput = 50
* compaction throughput = 65

This is a graph of p95 write latencies (ring was not taking reads) showing each node bootstrapping
left to right. The p95 latencies go from about 200ms -> ~500ms.

http://snapshots.librato.com/instrument/5j9l3qiq-7462.png

The write latencies appear to be largely driven by CPU as shown by:

http://snapshots.librato.com/instrument/xsfb688i-7463.png

Network graphs show that the joining nodes follow approximately the same bandwidth pattern:

http://snapshots.librato.com/instrument/ljvkvg6y-7464.png

What are the expected performance behaviors during bootstraping / ring expansion? The storage
loads in this test were fairly small so the duration of the spikes was short, at a much larger
production load we would need to sustain these spikes for hours. The throttle controls did
not seem to help as far as we could tell.

These are our current config changes:

{code}
-concurrent_reads: 32
-concurrent_writes: 32
+concurrent_reads: 64
+concurrent_writes: 64

-memtable_flush_queue_size: 4
+memtable_flush_queue_size: 5

-rpc_server_type: sync
+rpc_server_type: hsha

-#concurrent_compactors: 1
+concurrent_compactors: 6

-cross_node_timeout: false
+cross_node_timeout: true
-# phi_convict_threshold: 8
+phi_convict_threshold: 12

-endpoint_snitch: SimpleSnitch
+endpoint_snitch: Ec2Snitch

-internode_compression: all
+internode_compression: none
{code}

Heap settings:

{code}
export MAX_HEAP_SIZE="10G"
export HEAP_NEWSIZE="2G"
{code}




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message