cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eugene miretsky <>
Subject Re: How do TTLs generate tombstones
Date Tue, 10 Oct 2017 01:36:54 GMT
Thanks Alain!

We are using TWCS compaction, and I read your blog multiple times - it was
very useful, thanks!

We are seeing a lot of overlapping SSTables, leading to a lot of problems:
(a) large number of tombstones read in queries, (b) high CPU usage, (c)
fairly long Young Gen GC collection (300ms)

We have read_repair_change = 0, and unchecked_tombstone_compaction =
true, gc_grace_seconds
= 3h,  but we read and write with consistency = 1.

I'm suspecting the overlap is coming from either hinted handoff or a repair
job we run nightly.

1) Is running repair with TWCS recommended? It seems like it will always
create a neverending overlap (the repair SSTable will have data from all 24
hours), an effect that seems to get amplified with anti-compaction.
2) TWCS seems to introduce a tradeoff between eventual consistency and
write/read availability. If all repairs are turned off, then the choice is
either (a) user strong consistency level, and pay the price of lower
availability and slowers reads or writes, or (b) use lower consistency
level, and risk inconsistent data (data is never repaired)

I will try your last link but reappearing data sound a bit scary :)

Any advice on how to debug this further would be greatly apprecaited.


On Fri, Oct 6, 2017 at 11:02 AM, Alain RODRIGUEZ <> wrote:

> Hi Eugene,
> If we never use updates (time series data), is it safe to set
>> gc_grace_seconds=0.
> As Kurt pointed, you never want 'gc_grace_seconds' to be lower than
> 'max_hint_window_in_ms' as the min off these 2 values is used for hints
> storage window size in Apache Cassandra.
> Yet time series data with fixed TTLs allows a very efficient use of
> Cassandra, specially when using Time Window Compaction Strategy (TWCS).
> Funny fact is that Jeff brought it to Apache Cassandra :-). I would
> definitely give it a try.
> Here is a post from my colleague Alex that I believe could be useful in
> your case:
> Using TWCS and setting and lowering 'gc_grace_seconds' to the value of
> 'max_hint_window_in_ms' should be really effective. Make sure to use a
> strong consistency level (generally RF = 3, CL.Read = CL.Write =
> LOCAL_QUORUM) to prevent inconsistencies I would say (and depending on your
> interest in consistency).
> This way you could expire entires SSTables, without compaction. If
> overlaps in SSTables become a problem, you could even consider to give a
> try to a more aggressive SSTable expiration
> jira/browse/CASSANDRA-13418.
> C*heers,
> -----------------------
> Alain Rodriguez - @arodream -
> France / Spain
> The Last Pickle - Apache Cassandra Consulting
> 2017-10-05 23:44 GMT+01:00 kurt greaves <>:
>> No it's never safe to set it to 0 as you'll disable hinted handoff for
>> the table. If you are never doing updates and manual deletes and you always
>> insert with a ttl you can get away with setting it to the hinted handoff
>> period.
>> On 6 Oct. 2017 1:28 am, "eugene miretsky" <>
>> wrote:
>>> Thanks Jeff,
>>> Make sense.
>>> If we never use updates (time series data), is it safe to set
>>> gc_grace_seconds=0.
>>> On Wed, Oct 4, 2017 at 5:59 PM, Jeff Jirsa <> wrote:
>>>> The TTL'd cell is treated as a tombstone. gc_grace_seconds applies to
>>>> TTL'd cells, because even though the data is TTL'd, it may have been
>>>> written on top of another live cell that wasn't ttl'd:
>>>> Imagine a test table, simple key->value (k, v).
>>>> INSERT INTO table(k,v) values(1,1);
>>>> Kill 1 of the 3 nodes
>>>> UPDATE table USING TTL 60 SET v=1 WHERE k=1 ;
>>>> 60 seconds later, the live nodes will see that data as deleted, but
>>>> when that dead node comes back to life, it needs to learn of the deletion.
>>>> On Wed, Oct 4, 2017 at 2:05 PM, eugene miretsky <
>>>>> wrote:
>>>>> Hello,
>>>>> The following link says that TTLs generate tombstones -
>>>>> What exactly is the process that converts the TTL into a tombstone?
>>>>>    1. Is an actual new tombstone cell created when the TTL expires?
>>>>>    2. Or, is the TTLed cell treated as a tombstone?
>>>>> Also, does gc_grace_period have an effect on TTLed cells?
>>>>> gc_grace_period is meant to protect from deleted data re-appearing if
>>>>> tombstone is compacted away before all nodes have reached a consistent
>>>>> state. However, since the ttl is stored in the cell (in liveness_info),
>>>>> there is no way for the cell to re-appear (the ttl will still be there)
>>>>> Cheers,
>>>>> Eugene

View raw message