incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Derek Andree <dand...@lacunasystems.com>
Subject Re: Disk usage for CommitLog
Date Tue, 30 Aug 2011 03:20:01 GMT
Thanks Dan, good info.

> First off, what version of Cassandra are you using?

Sorry my bad, 0.8.4

> Provided you are using a recent Cassandra version (late 0.7 or 0.8.x) I doubt the commit
log is your problem. My experience using Cassandra as a time series data store (with a full
30 days of data + various aggregations) has been that the commit log is a trivial fraction
of the actual data. That said, its highly dependent on how you use your data and when it expires/gets
deleted (with considerations for gc_grace).

We keep 5 minute data on a few thousand "objects" for 13 months.  We also do "rollup" aggregation
for generating longer time period graphs and reports, very RRD like.  With a few months of
data, I see 86GB in commitlog and 42GB in data… but then again this is while I'm still in
data as fast as I can for a test case, so that may have something to do with it :)

> 
> As one final point, as of 0.8, I would not recommend playing with per-CF flush settings.
There are global thresholds which work far better and account for things like java overhead.

> 

Out of curiosity, why do global flush thresholds work better than per-CF settings?  My first
thought is that I would want finer grained controls as my CFs can be extremely different in
write/read patterns.

Thanks,
-Derek
Mime
View raw message