cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: tuning for read performance
Date Tue, 23 Oct 2012 07:30:09 GMT
>> and nodetool tpstats always shows pending tasks in the ReadStage.
Are clients reading a single row at a time or multiple rows ? Each row requested in a multi
get becomes a task in the read stage. 

Also look at the type of query you are sending. I talked a little about the performance of
different query techniques at Cassandra SFhttp://www.datastax.com/events/cassandrasummit2012/presentations

 
> 1. Consider Leveled compaction instead of Size Tiered.  LCS improves
> read performance at the cost of more writes.
I would look at other options first. 
If you want to know how many SSTables a read is hitting look at nodetool cfhistograms

> 2. You said "skinny column family" which I took to mean not a lot of
> columns/row.  See if you can organize your data into wider rows which
> allow reading fewer rows and thus fewer queries/disk seeks.

Wide rows take longer to read than narrow ones. Artificially wide rows may take longer to
read than narrow ones. 


> 4. Splitting your data from your MetaData could definitely help.  I
> like separating my read heavy from write heavy CF's because generally
> speaking they benefit from different compaction methods.  But don't go
> crazy creating 1000's of CF's either.

+1
25 ms read latency is high. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/10/2012, at 9:06 AM, Aaron Turner <synfinatic@gmail.com> wrote:

> On Mon, Oct 22, 2012 at 11:05 AM, feedly team <feedlydev@gmail.com> wrote:
>> Hi,
>>    I have a small 2 node cassandra cluster that seems to be constrained by
>> read throughput. There are about 100 writes/s and 60 reads/s mostly against
>> a skinny column family. Here's the cfstats for that family:
>> 
>> SSTable count: 13
>> Space used (live): 231920026568
>> Space used (total): 231920026568
>> Number of Keys (estimate): 356899200
>> Memtable Columns Count: 1385568
>> Memtable Data Size: 359155691
>> Memtable Switch Count: 26
>> Read Count: 40705879
>> Read Latency: 25.010 ms.
>> Write Count: 9680958
>> Write Latency: 0.036 ms.
>> Pending Tasks: 0
>> Bloom Filter False Postives: 28380
>> Bloom Filter False Ratio: 0.00360
>> Bloom Filter Space Used: 874173664
>> Compacted row minimum size: 61
>> Compacted row maximum size: 152321
>> Compacted row mean size: 1445
>> 
>> iostat shows almost no write activity, here's a typical line:
>> 
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
>> avgqu-sz   await  svctm  %util
>> sdb               0.00     0.00  312.87    0.00     6.61     0.00    43.27
>> 23.35  105.06   2.28  71.19
>> 
>> and nodetool tpstats always shows pending tasks in the ReadStage. The data
>> set has grown beyond physical memory (250GB/node w/64GB of RAM) so I know
>> disk access is required, but are there particular settings I should
>> experiment with that could help relieve some read i/o pressure? I already
>> put memcached in front of cassandra so the row cache probably won't help
>> much.
>> 
>> Also this column family stores smallish documents (usually 1-100K) along
>> with metadata. The document is only occasionally accessed, usually only the
>> metadata is read/written. Would splitting out the document into a separate
>> column family help?
>> 
> 
> Some un-expert advice:
> 
> 1. Consider Leveled compaction instead of Size Tiered.  LCS improves
> read performance at the cost of more writes.
> 
> 2. You said "skinny column family" which I took to mean not a lot of
> columns/row.  See if you can organize your data into wider rows which
> allow reading fewer rows and thus fewer queries/disk seeks.
> 
> 3. Enable compression if you haven't already.
> 
> 4. Splitting your data from your MetaData could definitely help.  I
> like separating my read heavy from write heavy CF's because generally
> speaking they benefit from different compaction methods.  But don't go
> crazy creating 1000's of CF's either.
> 
> Hope that gives you some ideas to investigate further!
> 
> 
> -- 
> Aaron Turner
> http://synfin.net/         Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
>    -- Benjamin Franklin
> "carpe diem quam minimum credula postero"


Mime
View raw message