From Dave Galbraith <>
Subject Really high read latency
Date Mon, 23 Mar 2015 04:56:00 GMT
Hi! So I've got a table like this:

CREATE TABLE "default".metrics (row_time int,attrs varchar,offset int,value
double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND
dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND
index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
AND compression={'sstable_compression':'LZ4Compressor'};

and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4 GB
of heap space. So it's timeseries data that I'm doing so I increment
"row_time" each day, "attrs" is additional identifying information about
each series, and "offset" is the number of milliseconds into the day for
each data point. So for the past 5 days, I've been inserting 3k
points/second distributed across 100k distinct "attrs"es. And now when I
try to run queries on this data that look like

"SELECT * FROM "default".metrics WHERE row_time = 5 AND attrs =

it takes an absurdly long time and sometimes just times out. I did
"nodetool cftsats default" and here's what I get:

Keyspace: default
    Read Count: 59
    Read Latency: 397.12523728813557 ms.
    Write Count: 155128
    Write Latency: 0.3675690719921613 ms.
    Pending Flushes: 0
        Table: metrics
        SSTable count: 26
        Space used (live): 35146349027
        Space used (total): 35146349027
        Space used by snapshots (total): 0
        SSTable Compression Ratio: 0.10386468749216264
        Memtable cell count: 141800
        Memtable data size: 31071290
        Memtable switch count: 41
        Local read count: 59
        Local read latency: 397.126 ms
        Local write count: 155128
        Local write latency: 0.368 ms
        Pending flushes: 0
        Bloom filter false positives: 0
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 2856
        Compacted partition minimum bytes: 104
        Compacted partition maximum bytes: 36904729268
        Compacted partition mean bytes: 986530969
        Average live cells per slice (last five minutes): 501.66101694915255
        Maximum live cells per slice (last five minutes): 502.0
        Average tombstones per slice (last five minutes): 0.0
        Maximum tombstones per slice (last five minutes): 0.0

Ouch! 400ms of read latency, orders of magnitude higher than it has any
right to be. How could this have happened? Is there something fundamentally
broken about my data model? Thanks!

