cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "" <>
Subject RE: Tombstoned data seems to remain after compaction
Date Wed, 13 Dec 2017 00:03:48 GMT
Hi Kurt,

Thank you very much for your reply.
Well, I’ll try it after do it on test environment just in case ☺


From: kurt greaves []
Sent: Wednesday, December 13, 2017 8:48 AM
To: User <>
Subject: Re: Tombstoned data seems to remain after compaction

As long as you've limited the throughput of compactions you should be fine (by default it's
16mbps, this can be changed through nodetool setcompactionthroughput or in the yaml) - it
will be no different to any other compaction occuring, the compaction will just take longer.
You should be aware however that a major compaction will use up to double the disk space currently
utilised by that table. Considering you've got lots of tombstones it will probably be a lot
less than double, but it will still be significant, so ensure you have enough free space for
the compaction to complete.

On 12 December 2017 at 07:44,<> <<>>
Hi Jeff, Kurt

Thanks again for your advice.

Within those valuable ideas you provide, I think of executing nodetool compact
because it is the most simplest way to try and I’m really novice about Cassandra.

One thing I’m concerned about the plan is that the major compaction might
have a serious impact on our production system, that use Cassandra as storage for
data cache for web session or something like that.

We use the Cassandra ring with three node. And Replicates to all 3 nodes, using
QUORUM consistency level on data update.

Under such condition above, Are there any risks if I execute Major compaction
to each nodes one by one? The whole system’s throughput seriously get worse
for example?

I know I’m asking difficult question because those impact should differ depending
their each situation, but advices on common belief of you are highly appreciated!


From: Jeff Jirsa [<>]
Sent: Tuesday, December 12, 2017 2:35 AM
To: cassandra <<>>
Subject: Re: Tombstoned data seems to remain after compaction

Hello Takashima,

Answers inline.

On Sun, Dec 10, 2017 at 11:41 PM,<> <<>>
Hi Jeff

I’m appreciate for your detailed explanation :)

>  Expired data gets purged on compaction as long as it doesn’t overlap with other live
data. The overlap thing can be difficult to reason about, but it’s meant to ensure correctness
in the event that you write a value with ttl 180, then another value with ttl 1, and you don’t
want to remove the value with ttl1 until you’ve also removed the value with ttl180, since
it would lead to data being resurrected

I understand that TTL setting sometimes does not work as we expect, especially when we alter
value afterword because of the Cassandra’s data consistency functionalities. My understanding

If "does not work as you expect" you mean "data is not cleared immediately upon expiration",
that is correct.

And I think of trying sstablesplit utility to let the Cassandra do minor compaction because
one of
SSTables, which is oldest and very large so I want to compact it.

That is offline and requires downtime, which is usually not something you want to do if you
can avoid it.

Instead, I recommend you consider the tombstone compaction subproperties to compaction, which
let you force single sstable comapctions based on tombstone percentage (and set that low enough
that it reclaims the space you want to reclaim).

Perhaps counterintuitively, compaction is most effective at freeing up space when it makes
one very big file, compared to lots of little files - sstablesplit is probably not a good
idea. A major compaction may help, if you have the extra IO and disk space.

Again, though, you should probably consider using something other than STCS going forward.

View raw message