incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Coli <>
Subject Re: Cassandra disk space utilization WAY higher than I would expect
Date Fri, 06 Aug 2010 21:49:36 GMT
On 8/5/10 11:51 AM, Peter Schuller wrote:
> Also, the variation in disk space in your most recent post looks
> entirely as expected to me and nothing really extreme. The temporary
> disk space occupied during the compact/cleanup would easily be as high
> as your original disk space usage to begin with, and the fact that
> you're reaching the 5-7 GB per node level after a cleanup has
> completed fully and all obsolete sstables have been removed

Your post refers to "obsolete" sstables, but the only thing that makes 
them "obsolete" in this case is that they have been compacted?

As I understand Julie's case, she is :

a) initializing her cluster
b) inserting some number of unique keys with CL.ALL
c) noticing that more disk space (6x?) than is expected is used
d) but that she gets expected usage if she does a major compaction

In other words, the problem isn't "temporary disk space occupied during 
the compact", it's permanent disk space occupied unless she compacts.

There is clearly overhead from there being multiple SSTables with 
multiple bloom filters and multiple indexes. But from my understanding, 
that does not fully account for the difference in disk usage she is 
seeing. If it is 6x across the whole cluster, it seems unlikely that the 
meta information is 5x the size of the actual information.

I haven't been following this thread very closely, but I don't think 
"obsolete" SSTables should be relevant, because she's not doing UPDATE 
or DELETE and she hasn't changed cluster topography (the "cleanup" case).

Julie : when compaction occurs, it logs the number of bytes that it 
started with and the number it ended with, as well as the number of keys 
involved in the compaction. What do these messages say?

example line :

INFO [COMPACTION-POOL:1] 2010-08-06 13:48:00,328 
(line 398) Compacted to /path/to/MyColumnFamily-26-Data.db. 
999999999/888888888 bytes for 12345678 keys.  Time: 123456ms.


View raw message