cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: Compaction and total disk space used for highly overwritten CF
Date Thu, 06 Oct 2011 09:13:48 GMT
You will only have tombstones in your data if you issue deletes.

What you are seeing is an artifact of the fundamental way Cassandra stores data. Once data
is written to disk it is never modified. If you overwrite a column value that has already
been committed to disk the old value is not changed. Instead the new value is held in memory
and some time later it is written to a new file (more info here

Compaction not only kersplats data that has been deleted, it kapows data that has been over
written. (See this link for a dramatic first person re-creation of compaction removing an
overwritten value )
By overwriting all the data so often you are somewhat fighting against the server But there
are some things you can try (am assuming 0.8.6, some general background

* reduce the min_compaction_threshold on the CF so that data on disk gets compacted more frequently.

* look at the logs to too see why / when memtables are been flushed, look for lines like 
	INFO [ScheduledTasks:1] 2011-10-02 22:32:20,092 (line 1128) Enqueuing
flush of Memtable-NoCache_Ascending@921142878(2175000/13267958 serialized/live bytes, 43500
	WARN [ScheduledTasks:1] 2011-10-02 22:32:20,084 (line 143) Heap is 0.778906484049155
full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the
two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml
if you don't want Cassandra to do this automatically

* The memtable will be flushed to disk for 1 of 3 reasons:
	* The Heap is too full and cassandra wants to free memory
	* It has passed the memtable_operations CF threshold for changes, increase this value to
flush less
	* It has passed the memtable_throughput CF threshold for throughput, increase this value
to flush less

* is possible reduce the amount of overwrites.  

Hope that helps. 

Aaron Morton
Freelance Cassandra Developer

On 6/10/2011, at 2:42 PM, Derek Andree wrote:

> We have a very hot CF which we use essentially as a durable memory cache for our application.
 It is about 70MBytes in size after being fully populated.  We completely overwrite this entire
CF every few minutes (not delete).  Our hope was that the CF would stay around 70MB in size,
but it grows to multiple Gigabytes in size rather quickly (less than an hour).  I've heard
that doing major compactions using nodetool is no longer recommended, but when we force a
compaction on this CF using nodetool compact, then perform GC, size on disk shrinks to the
expected 70MB.
> I'm wondering if we are doing something wrong here, we thought we were avoiding tombstones
since we are just overwriting each column using the same keys.  Is the fact that we have to
do a GC to get the size on disk to shrink significantly a smoking gun that we have a bunch
of tombstones?
> We've row cached the entire CF to make reads really fast, and writes are definitely fast
enough, it's this growing disk space that has us concerned.
> Here's the output from nodetool cfstats for the CF in question (hrm, I just noticed that
we still have a key cache for this CF which is rather dumb):
> 		Column Family: Test
> 		SSTable count: 4
> 		Space used (live): 309767193
> 		Space used (total): 926926841
> 		Number of Keys (estimate): 275456
> 		Memtable Columns Count: 37510
> 		Memtable Data Size: 15020598
> 		Memtable Switch Count: 22
> 		Read Count: 4827496
> 		Read Latency: 0.010 ms.
> 		Write Count: 1615946
> 		Write Latency: 0.095 ms.
> 		Pending Tasks: 0
> 		Key cache capacity: 150000
> 		Key cache size: 55762
> 		Key cache hit rate: 0.030557854052177317
> 		Row cache capacity: 150000
> 		Row cache size: 68752
> 		Row cache hit rate: 1.0
> 		Compacted row minimum size: 925
> 		Compacted row maximum size: 1109
> 		Compacted row mean size: 1109
> Any insight appreciated.
> Thanks,
> -Derek

View raw message