incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Derek Andree <dand...@lacunasystems.com>
Subject Disk usage for CommitLog
Date Tue, 30 Aug 2011 01:04:49 GMT
I run a single node cassandra instance, and we have lots of overwrites on a hot CF and disk
utilization seems to grow pretty fast.  We've noticed that when we restart cassandra disk
utilization decreases dramatically (dramatic being something close to 50%).  Most of this
growth seems to be in the commitlog directory which are replayed when cassandra starts, then
removed.

So I understand that writes go to commit log and then to memtable, then to SSTable.  I'm curious
when the CommitLogs get cleaned up, is it during a compaction or is it when everything in
the commit log is written to SSTable?  Is there an easy way to keep commit log size down without
killing performance?

I've read this:

http://wiki.apache.org/cassandra/MemtableThresholds

Since larger memtables help to absorb overwrites, I'd like to increase MemTableThroughputInMB
and maybe play with MemtableOperationsInMillions as well, but I'm wondering if this will lead
to even more dramatic disk utilization in the commitlog directory.  It seems like larger memtables
would naturally mean more disk utilization by the commit logs.

Our write load is very predictable and always the same, tons of writes for time series statistics
every 5 minutes.

While I'm fine with temporary commit logs growing in size, I'm wondering if we should be forcing
compactions, forcing GC, or doing some form of cleanup to keep them from getting too big.
 Mainly I just need to know how much disk utilization I can expect from a given number of
writes, and I'm wondering if there is some "fudge factor" I should account for with commit
logs.  Any advice appreciated.

Thanks,
-Derek
Mime
View raw message