I have some big latencies (OpsCenter homepage shows an average about 30-60 ms), inducing instability in my front servers, stacking queries, waiting for C* to answer, in the following 1.1.6 C* cluster:
10.208.45.173 eu-west 1b Up Normal 297.02 GB 100.00% 0
10.208.40.6 eu-west 1b Up Normal 292.91 GB 100.00% 56713727820156407428984779325531226112
10.208.47.135 eu-west 1b Up Normal 307.96 GB 100.00% 113427455640312814857969558651062452224
I run on 3 AWS m1.xLarge with mostly the Datastax AMI default node configuration (But with the following options MAX_HEAP_SIZE="8G" HEAP_NEWSIZE="400M", I was under regular memory pressure with the default 4GB heap, maybe because of bloomfilters). RF = 3, CL = QUORUM r/w.
I have a high load from 4 to 15 with an average of 8 (mainly because of iowait which can reach up to 40-60%).
extract from "iostat -mx 5 10" :
avg-cpu: %user %nice %system %iowait %steal %idle
16.66 0.00 4.82 35.47 0.21 42.85
I use compression and Size Tiered Compaction Strategy for any of my CF.
A typical CF :
create column family active_product
with column_type = 'Standard'
and comparator = 'UTF8Type'
and default_validation_class = 'UTF8Type'
and key_validation_class = 'UTF8Type'
and read_repair_chance = 0.1
and dclocal_read_repair_chance = 0.0
and gc_grace = 864000
and min_compaction_threshold = 4
and max_compaction_threshold = 12
and replicate_on_write = true
and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
and caching = 'KEYS_ONLY'
and bloom_filter_fp_chance = 0.01
and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};
And there is a typical counter CF:
create column family algo_product_view
with column_type = 'Standard'
and comparator = 'UTF8Type'
and default_validation_class = 'CounterColumnType'
and key_validation_class = 'UTF8Type'
and read_repair_chance = 0.1
and dclocal_read_repair_chance = 0.0
and gc_grace = 864000
and min_compaction_threshold = 4
and max_compaction_threshold = 12
and replicate_on_write = true
and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
and caching = 'KEYS_ONLY'
and bloom_filter_fp_chance = 0.01
and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'};
I attach my cfhistograms, proxyhistograms, cfstats and tpstats hopping a clue is somewhere in there, even if I was unable to learn something there by myself.
These latencies are quite annoying, hope you'll help me figuring out what I am doing wrong or how I can tune Cassandra better.