incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <>
Subject Re: is node tool row count always way off?
Date Thu, 27 Sep 2012 09:38:45 GMT
> The node tool cfstats, what is the row count estimate usually off by(what percentage?
Or what absolute number?)

It will likely not be very good but is supposed to give some order of
magnitude. That being said there is at least the following sources of
 - It counts deleted rows that have not been gced (i.e. that has been
deleted since less than gc_grace).
 - It estimate the number of rows for each sstable and sum that. If
you have a relatively low amount of rows that you overwrite a lot, it
means the estimate can be at least nb_sstables times more than reality
 - For an sstable, it estimate the number of rows by using the index
summary in memory. We know that this summary is one out of every 128
rows so we take the summary size and multiply by 128. But this means
we can +/- 128 error range when doing that. In practice, when you
don't have trivial amount of rows which is the common case, this is a
pretty good estimate. However, in your case you probably have 1 or 2
row per sstable, which each time yield one index bucket and is thus
count as 128 (and 128 * 3 = 384, the math adds up).

So don't rely too much on that number, but excluding the case of very
low amount of rows, and provided you're not too much behind on
compaction, it does give an order of magnitude.

> An SSTable of 3 sounds very weird….

It's not. SizeTieredCompaction never compact less than
min_compaction_threshold sstables, and that min_compaction_threshold
is 4 by default. That threshold is configurable but 4 is probably a
good value for any CF that has a non trivial write load.


View raw message