incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Bromhead <...@instaclustr.com>
Subject Re: Storing log structured data in Cassandra without compactions for performance boost.
Date Wed, 07 May 2014 22:22:56 GMT
If you make the timestamp the partition key you won't be able to do range queries (unless you
use an ordered partitioner).

Assuming you are logging from multiple devices you will want your partition key to be the
device id & the date, your clustering key to be the timestamp (timeuuid are good to prevent
collisions) and then log message, levels etc as the other columns.

Then you can also create a new table for every week (or day/month depending on how much granularity
you want) and just write to the current weeks table. This step allows you to delete old data
without Cassandra using tombstones (you just drop the table for the week of logs you want
to delete).

For a much clearer explantation see http://www.slideshare.net/patrickmcfadin/cassandra-20-and-timeseries
(the last few slides).

As for compaction, I would leave it enabled as having lots of stables hanging around can make
range queries slower (the query has more files to visit). See http://stackoverflow.com/questions/8917882/cassandra-sstables-and-compaction
(a little old but still relevant). Compaction also fixes up things like merging row fragments
(when you write new columns to the same row).


Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359


On 07/05/2014, at 10:55 AM, Kevin Burton <burton@spinn3r.com> wrote:

> I'm looking at storing log data in Cassandra… 
> 
> Every record is a unique timestamp for the key, and then the log line for the value.
> 
> I think it would be best to just disable compactions.
> 
> - there will never be any deletes.
> 
> - all the data will be accessed in time range (probably partitioned randomly) and sequentially.
> 
> So every time a memtable flushes, we will just keep that SSTable forever.  
> 
> Compacting the data is kind of redundant in this situation.
> 
> I was thinking the best strategy is to use setcompactionthreshold and set the value VERY
high to compactions are never triggered.
> 
> Also, It would be IDEAL to be able to tell cassandra to just drop a full SSTable so that
I can truncate older data without having to do a major compaction and without having to mark
everything with a tombstone.  Is this possible?
> 
> 
> 
> -- 
> 
> Founder/CEO Spinn3r.com
> Location: San Francisco, CA
> Skype: burtonator
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
> 


Mime
View raw message