incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Schuller <>
Subject Re: Index interval tuning
Date Mon, 09 May 2011 15:58:53 GMT
> I have a few sstables with around 500 million keys, and memory usage has
> grown a lot, I suppose because of the indexes. This sstables are
> comprised of skinny rows, but a lot of them. Would tuning index interval
> make the memory usage go down? And what would the performance hit be?

Assuming no row caching, and assuming you're talking about heap usage
and not the virtual size of the process in top, the primary two things
that will grow with row count are (1) bloom filters for sstables and
(2) the sampled index keys. Bloom filters are of a certain size to
achieve a sufficiently small false positive rate. That target rate
could be increased to allow smaller bloom filters, but that is not
exposed as a configuration option and would require code changes.

For key sampling, the primary performance penalty should be CPU and
maybe some disk. On average, when looking up a key an sstable index
file, you'll read sample interval/2 entries and deserialize them
before finding the one you're after. Increasing sampling interval will
thus increase the amount of deserialization taking place, as well as
make the average range of data span additional pages on disk. The
impact on disk is difficult to judge and likely depends a lot on i/o
scheduling and other details.

/ Peter Schuller

View raw message