"remember is used more IO than STS"

Are you meaning during compactions ? Because I thought that LCS should decrease the number of disks reads (since 90% of the data aren't spread across multiple sstables and C* needs to read only a file to find the entire row) while not compacting right ?


2013/3/28 aaron morton <aaron@thelastpickle.com>
You nailed it. A significant number of reads are done from hundreds of sstables ( I have to add, compaction is apparently constantly 6000-7000 tasks behind and the vast majority of the reads access recently written data )
So that's not good. 
If IO is saturated then maybe LCS is not for you, remember is used more IO than STS. 
Otherwise look at the compaction yaml settings to see if you can make it go faster but watch out that you don't hurt normal requests. 

CHeers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton

On 28/03/2013, at 7:00 AM, Wei Zhu <wz1975@yahoo.com> wrote:

Welcome to the wonderland of SSTableSize of LCS. There is some discussion around it, but no guidelines yet.

I asked the people in the IRC, someone is running as high as 128M on the production with no problem. I guess you have to test it on your system and see how it performs.

Attached is the related thread for your reference.

-Wei

----- Original Message -----
From: "Andras Szerdahelyi" <andras.szerdahelyi@ignitionone.com>
To: user@cassandra.apache.org
Sent: Wednesday, March 27, 2013 1:19:06 AM
Subject: Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01


Aaron,




What version are you using ?


1.1.9





Have you changed the bf_ chance ? The sstables need to be rebuilt for it to take affect.


I did ( several times ) and I ran upgradesstables after





Not sure what this means.
Are you saying it's in a boat on a river, with tangerine trees and marmalade skies ?


You nailed it. A significant number of reads are done from hundreds of sstables ( I have to add, compaction is apparently constantly 6000-7000 tasks behind and the vast majority of the reads access recently written data )





Take a look at the nodetool cfhistograms to get a better idea of the row size and use that info when consdiering the sstable size.


It's around 1-20K, what should I optimise the LCS sstable size for? I suppose "I want to fit as many complete rows as possible in to a single sstable to keep file count down while avoiding compactions of oversized ( double digit gigabytes? ) sstables at higher levels ? "
Do I have to run a major compaction after a change to sstable_size_in_mb ? The larger sstable size wouldn't really affect sstables on levels above L0 , would it?






Thanks!!
Andras






From: aaron morton < aaron@thelastpickle.com >
Reply-To: " user@cassandra.apache.org " < user@cassandra.apache.org >
Date: Tuesday 26 March 2013 21:46
To: " user@cassandra.apache.org " < user@cassandra.apache.org >
Subject: Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01




What version are you using ?
1.2.0 allowed a null bf chance, and I think it returned .1 for LCS and .01 for STS compaction.
Have you changed the bf_ chance ? The sstables need to be rebuilt for it to take affect.





and sstables read is in the skies Not sure what this means.
Are you saying it's in a boat on a river, with tangerine trees and marmalade skies ?





SSTable count: 22682

Lots of files there, I imagine this would dilute the effectiveness of the key cache. It's caching (sstable, key) tuples.
You may want to look at increasing the sstable_size with LCS.





Compacted row minimum size: 104
Compacted row maximum size: 263210


Compacted row mean size: 3041
Take a look at the nodetool cfhistograms to get a better idea of the row size and use that info when consdiering the sstable size.


Cheers








-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand


@aaronmorton
http://www.thelastpickle.com


On 26/03/2013, at 6:16 AM, Andras Szerdahelyi < andras.szerdahelyi@ignitionone.com > wrote:




Hello list,


Could anyone shed some light on how an FP chance of 0.01 coexist with a measured FP ratio of .. 0.98 ? Am I reading this wrong or are 98% of the requests hitting the bloom filter create a false positive while the "target" false ratio is 0.01?
( Also key cache hit ratio is around 0.001 and sstables read is in the skies ( non-exponential (non-) drop off for LCS ) but that should be filed under "effect" and not "cause"? )



[default@unknown] use KS;
Authenticated to keyspace: KS
[default@KS] describe CF;
ColumnFamily: CF
Key Validation Class: org.apache.cassandra.db.marshal.BytesType
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Columns sorted by: org.apache.cassandra.db.marshal.BytesType
GC grace seconds: 691200
Compaction min/max thresholds: 4/32
Read repair chance: 0.1
DC Local Read repair chance: 0.0
Replicate on write: true
Caching: ALL
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.LeveledCompactionStrategy
Compaction Strategy Options:
sstable_size_in_mb: 5
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor



Keyspace: KS
Read Count: 628950
Read Latency: 93.19921121869784 ms.
Write Count: 1219021
Write Latency: 0.14352380885973254 ms.
Pending Tasks: 0
Column Family: CF
SSTable count: 22682
Space used (live): 119771434915
Space used (total): 119771434915
Number of Keys (estimate): 203837952
Memtable Columns Count: 13125
Memtable Data Size: 33212827
Memtable Switch Count: 15
Read Count: 629009
Read Latency: 88.434 ms.
Write Count: 1219038
Write Latency: 0.095 ms.
Pending Tasks: 0
Bloom Filter False Positives: 37939419
Bloom Filter False Ratio: 0.97928
Bloom Filter Space Used: 261572784
Compacted row minimum size: 104
Compacted row maximum size: 263210
Compacted row mean size: 3041


I upgraded sstables after changing the FP chance


Thanks!
Andras
<attachment.eml>