Whats your data model look like?

I think it would be best to just disable compactions.

Why? are you never doing reads?  There is also a cost to repairs/bootstrapping when you have a ton of sstables.  This might be a premature optimization.

If the data is read from a slice of a partition that has been added over time there will be a part of that row in every almost sstable. That would mean all of them (multiple disk seeks depending on clustering order per sstable) would have to be read from in order to service the query.  Data model can help or hurt a lot though.

If you set the TTL for the columns you added then C* will clean up sstables (if size tiered and post 1.2) once the datas been expired.  Since you never delete set the gc_grace_seconds to 0 so the ttl expiration doesnt result in tombstones.

Chris Lohfink 

On May 6, 2014, at 7:55 PM, Kevin Burton <burton@spinn3r.com> wrote:

I'm looking at storing log data in Cassandra… 

Every record is a unique timestamp for the key, and then the log line for the value.

I think it would be best to just disable compactions.

- there will never be any deletes.

- all the data will be accessed in time range (probably partitioned randomly) and sequentially.

So every time a memtable flushes, we will just keep that SSTable forever.  

Compacting the data is kind of redundant in this situation.

I was thinking the best strategy is to use setcompactionthreshold and set the value VERY high to compactions are never triggered.

Also, It would be IDEAL to be able to tell cassandra to just drop a full SSTable so that I can truncate older data without having to do a major compaction and without having to mark everything with a tombstone.  Is this possible?


Founder/CEO Spinn3r.com
Location: San Francisco, CA
Skype: burtonator
… or check out my Google+ profile
War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.