cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Schuller <>
Subject Re: Cassandra disk space utilization WAY higher than I would expect
Date Fri, 06 Aug 2010 22:30:23 GMT
> Your post refers to "obsolete" sstables, but the only thing that makes them
> "obsolete" in this case is that they have been compacted?


> As I understand Julie's case, she is :
> a) initializing her cluster
> b) inserting some number of unique keys with CL.ALL
> c) noticing that more disk space (6x?) than is expected is used
> d) but that she gets expected usage if she does a major compaction
> In other words, the problem isn't "temporary disk space occupied during the
> compact", it's permanent disk space occupied unless she compacts.

Sorry, I re-read my previous message and I wasn't being very clear and
I was probably a bit confused too ;)

The reason I mention temporary spikes during compaction is that these
are fully expected to roughly double disk space use. I did not mean to
imply that this is what she is seeing, since she's specifically
waiting for background compaction to complete. Rather, my point was
that as long as we are only talking about roughly a doubling of disk
space, regardless of its cause, it is not worse than what you may
expect anyway, even if temporarily, during active compaction.

I still have no explanation for lingering data, other than obsolete
sstables waiting for the GC to trigger actual file removal. *However*,
and this is what I meant with my follow-up, that still does not
explain the data from her post unless 'nodetool ring' reports total
sstable size rather than the total size of live sstables.

But let's suppose that's wrong and 'nodetool ring' somehow does
include all sstable data; in such a case it seems that the data from
Julie's latest post may be consistent with obsolete sstables, since
the repeated attempts at compaction/cleanup may very well have
triggered a CMS sweep, thus eventually freeing the sstables (while
simply leaving the cluster idle would not have done so normally).

It is also worth noting that this kind of space waste (sstables
hanging around waiting for a GC) is not likely to scale to larger data
sets as the probability of triggering a CMS sweep within a reasonable
period after compaction, increases with the amount of traffic you
throw at the cluster. So if you're e.g. writing 100 gb of data, you're
pretty unlikely to not trigger a CMS sweep within the first few
percent (assuming you don't have a huge 250 gb heap or something). For
this reason (even disregarding that cassandra tries to trigger a GC
when disk runs low) I would not count them as expected to be
"cumulative" on top of any temporary spikes resulting from compaction;
so in effect the idea is that these effects should normally not matter
since you need the disk space to deal with the compaction spikes
anyway, and you are unlikely to have both compaction spikes *and*
these obsolete sstables at the same time even without the GC
triggering cassandra does.

> There is clearly overhead from there being multiple SSTables with multiple
> bloom filters and multiple indexes. But from my understanding, that does not
> fully account for the difference in disk usage she is seeing. If it is 6x
> across the whole cluster, it seems unlikely that the meta information is 5x
> the size of the actual information.

Definitely agreed. I should have made it clearer that I was only
addressing the post.

Julie, are you still seeing 6x and similar factors of wastes in your
re-producable test cases, or was the 6x factor limited to your initial
experience with real data?

The reason I'm barking up this tree is that I figure we may be
observing the results of two independent problems here, and if the
smaller test case can be explained away by lack of GC, then it
probably doesn't help figuring out what happened in the original
problem scenario.

Hmm. Maybe a 'nodetool gc' would be in order to make it easy to
trigger a gc without going for jconsole/JMX manually.

> I haven't been following this thread very closely, but I don't think
> "obsolete" SSTables should be relevant, because she's not doing UPDATE or
> DELETE and she hasn't changed cluster topography (the "cleanup" case).

Even a workload strictly limited to writing new data with unique keys
and column names will cause sstables to become obsolete, as long as
you write enough data to reach the compaction threshold.

Just to be clear, my perhaps misuse of the term "obsolete" only refers
to sstables that have been successfully compacted together with others
into a new sstable which replaces the old ones (hence making them
"obsolete"). I do not mean to imply that they contain obsolete

/ Peter Schuller

View raw message