From feedly team <>
Subject tuning for read performance
Date Mon, 22 Oct 2012 18:05:49 GMT
    I have a small 2 node cassandra cluster that seems to be constrained by
read throughput. There are about 100 writes/s and 60 reads/s mostly against
a skinny column family. Here's the cfstats for that family:

 SSTable count: 13
 Space used (live): 231920026568
 Space used (total): 231920026568
 Number of Keys (estimate): 356899200
 Memtable Columns Count: 1385568
 Memtable Data Size: 359155691
 Memtable Switch Count: 26
 Read Count: 40705879
 Read Latency: 25.010 ms.
 Write Count: 9680958
 Write Latency: 0.036 ms.
 Pending Tasks: 0
 Bloom Filter False Postives: 28380
 Bloom Filter False Ratio: 0.00360
 Bloom Filter Space Used: 874173664
 Compacted row minimum size: 61
 Compacted row maximum size: 152321
 Compacted row mean size: 1445

iostat shows almost no write activity, here's a typical line:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
sdb               0.00     0.00  312.87    0.00     6.61     0.00    43.27
   23.35  105.06   2.28  71.19

and nodetool tpstats always shows pending tasks in the ReadStage. The data
set has grown beyond physical memory (250GB/node w/64GB of RAM) so I know
disk access is required, but are there particular settings I should
experiment with that could help relieve some read i/o pressure? I already
put memcached in front of cassandra so the row cache probably won't help

Also this column family stores smallish documents (usually 1-100K) along
with metadata. The document is only occasionally accessed, usually only the
metadata is read/written. Would splitting out the document into a separate
column family help?


