incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Turner <synfina...@gmail.com>
Subject Re: tuning for read performance
Date Mon, 22 Oct 2012 20:06:06 GMT
On Mon, Oct 22, 2012 at 11:05 AM, feedly team <feedlydev@gmail.com> wrote:
> Hi,
>     I have a small 2 node cassandra cluster that seems to be constrained by
> read throughput. There are about 100 writes/s and 60 reads/s mostly against
> a skinny column family. Here's the cfstats for that family:
>
>  SSTable count: 13
>  Space used (live): 231920026568
>  Space used (total): 231920026568
>  Number of Keys (estimate): 356899200
>  Memtable Columns Count: 1385568
>  Memtable Data Size: 359155691
>  Memtable Switch Count: 26
>  Read Count: 40705879
>  Read Latency: 25.010 ms.
>  Write Count: 9680958
>  Write Latency: 0.036 ms.
>  Pending Tasks: 0
>  Bloom Filter False Postives: 28380
>  Bloom Filter False Ratio: 0.00360
>  Bloom Filter Space Used: 874173664
>  Compacted row minimum size: 61
>  Compacted row maximum size: 152321
>  Compacted row mean size: 1445
>
> iostat shows almost no write activity, here's a typical line:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sdb               0.00     0.00  312.87    0.00     6.61     0.00    43.27
> 23.35  105.06   2.28  71.19
>
> and nodetool tpstats always shows pending tasks in the ReadStage. The data
> set has grown beyond physical memory (250GB/node w/64GB of RAM) so I know
> disk access is required, but are there particular settings I should
> experiment with that could help relieve some read i/o pressure? I already
> put memcached in front of cassandra so the row cache probably won't help
> much.
>
> Also this column family stores smallish documents (usually 1-100K) along
> with metadata. The document is only occasionally accessed, usually only the
> metadata is read/written. Would splitting out the document into a separate
> column family help?
>

Some un-expert advice:

1. Consider Leveled compaction instead of Size Tiered.  LCS improves
read performance at the cost of more writes.

2. You said "skinny column family" which I took to mean not a lot of
columns/row.  See if you can organize your data into wider rows which
allow reading fewer rows and thus fewer queries/disk seeks.

3. Enable compression if you haven't already.

4. Splitting your data from your MetaData could definitely help.  I
like separating my read heavy from write heavy CF's because generally
speaking they benefit from different compaction methods.  But don't go
crazy creating 1000's of CF's either.

Hope that gives you some ideas to investigate further!


-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"

Mime
View raw message