incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Derek Andree <dand...@lacunasystems.com>
Subject Compaction and total disk space used for highly overwritten CF
Date Thu, 06 Oct 2011 01:42:52 GMT
We have a very hot CF which we use essentially as a durable memory cache for our application.
 It is about 70MBytes in size after being fully populated.  We completely overwrite this entire
CF every few minutes (not delete).  Our hope was that the CF would stay around 70MB in size,
but it grows to multiple Gigabytes in size rather quickly (less than an hour).  I've heard
that doing major compactions using nodetool is no longer recommended, but when we force a
compaction on this CF using nodetool compact, then perform GC, size on disk shrinks to the
expected 70MB.

I'm wondering if we are doing something wrong here, we thought we were avoiding tombstones
since we are just overwriting each column using the same keys.  Is the fact that we have to
do a GC to get the size on disk to shrink significantly a smoking gun that we have a bunch
of tombstones?

We've row cached the entire CF to make reads really fast, and writes are definitely fast enough,
it's this growing disk space that has us concerned.

Here's the output from nodetool cfstats for the CF in question (hrm, I just noticed that we
still have a key cache for this CF which is rather dumb):

		Column Family: Test
		SSTable count: 4
		Space used (live): 309767193
		Space used (total): 926926841
		Number of Keys (estimate): 275456
		Memtable Columns Count: 37510
		Memtable Data Size: 15020598
		Memtable Switch Count: 22
		Read Count: 4827496
		Read Latency: 0.010 ms.
		Write Count: 1615946
		Write Latency: 0.095 ms.
		Pending Tasks: 0
		Key cache capacity: 150000
		Key cache size: 55762
		Key cache hit rate: 0.030557854052177317
		Row cache capacity: 150000
		Row cache size: 68752
		Row cache hit rate: 1.0
		Compacted row minimum size: 925
		Compacted row maximum size: 1109
		Compacted row mean size: 1109


Any insight appreciated.

Thanks,
-Derek


Mime
View raw message