cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Anchlia <>
Subject Re: What sort of load do the tombstones create on the cluster?
Date Mon, 21 Nov 2011 19:53:50 GMT
On Mon, Nov 21, 2011 at 11:47 AM, Edward Capriolo <> wrote:
> On Mon, Nov 21, 2011 at 3:30 AM, Philippe <> wrote:
>> I don't remember your exact situation but could it be your network
>> connectivity?
>> I know I've been upgrading mine because I'm maxing out fastethernet on a
>> 12 node cluster.
>> Le 20 nov. 2011 22:54, "Jahangir Mohammed" <> a
>> écrit :
>>> Mostly, they are I/O and CPU intensive during major compaction. If
>>> ganglia doesn't have anything suspicious there, then what is performance
>>> loss ? Read or write?
>>> On Nov 17, 2011 1:01 PM, "Maxim Potekhin" <> wrote:
>>>> In view of my unpleasant discovery last week that deletions in Cassandra
>>>> lead to a very real
>>>> and serious performance loss, I'm working on a strategy of moving
>>>> forward.
>>>> If the tombstones do cause such problem, where should I be looking for
>>>> performance bottlenecks?
>>>> Is it disk, CPU or something else? Thing is, I don't see anything
>>>> outstanding in my Ganglia plots.
>>>> TIA,
>>>> Maxim
> Tomstones do have a performance impact particularly in cases where data has
> a lot of data turnover and your are using the standard (non LevelDB
> compaction). Tombstones live on disk for gc_grace_seconds. First the
> tombstone takes up some small amount of space, which has an effect on disk
> caching. Secondly bloom filters having a tombstone has an effect on the read
> path. As a read for a row key will now match multiple bloom filters.
> If you are constantly adding and removing data and you have a long
> gc_grace_seconds (10 days is pretty long if your dataset is new every day
> for example) this is more profound then the use case that rarely deletes.
> This is why you will notice some use cases call for 'major compaction' while
> other people believe you should never need it.
> I force majors on some columns families because there is a high turnover and
> the data needs to be read often and the difference in data size is the
> difference between a 20GB size on disk that fits in VFS cache or a 35Gb size
> on disk that doesn't (and also may 'randomly' have a large compaction at
> peak time.)
> I am pretty excited about LevelDB because of how the tiered compaction looks
> to be more space efficient.

Have you got chance to do benchmarking on the LevelDB compaction?


View raw message