cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mck (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
Date Sun, 27 Aug 2017 21:23:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142240#comment-16142240
] 

mck edited comment on CASSANDRA-13418 at 8/27/17 9:22 PM:
----------------------------------------------------------

{quote}N.B: I tried to apply the syle guide found in .idea/codeStyleSettings.xml but it is
changing me a lot of things. Do you know if it is up to date ?{quote}
I don't use IntelliJ so I can't answer that for you, sry. [~krummas]?
Otherwise you can ask on irc #cassandra or on the user mailing list.

{quote}I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated{quote}
I'm -1 on this for the moment. While it holds a logic argument, as you explain, it's not intuitive
for the user. The user has to know that this happens (via docs or via code). I'd be more comfortable
expecting the users using an advanced toggle like this (requires system properties and table
option) to appreciate the difference between {{uncheckedTombstoneCompaction}} and {{unsafe_aggressive_sstable_expiration}}
and to enable both. Any smarts can be added latter on with further user feedback and experience.

Could we, instead of setting {{uncheckedTombstoneCompaction}}, log a warning telling the user
that they probably want to {{uncheckedTombstoneCompaction}} set as well?


was (Author: michaelsembwever):
{quote}N.B: I tried to apply the syle guide found in .idea/codeStyleSettings.xml but it is
changing me a lot of things. Do you know if it is up to date ?{quote}
I don't use IntelliJ so I can't answer that for you, sry. [~krummas]?
Otherwise you can ask on irc #cassandra or on the user mailing list.

{quote}I enable uncheckedTombstoneCompaction when ignoreOverlaps is activated{quote}
I'm -1 on this for the moment. While it holds a logic argument, as you explain, it's not intuitive
for the user. The user has to know that this happens (via docs or via code). I'd be more comfortable
expecting the users using an advanced toggle like this (requires system properties and table
option) to appreciate the difference between {{uncheckedTombstoneCompaction}} and {{unsafe_aggressive_sstable_expiration}}
and to enable both. Any smarts can be added latter on with further user feedback and experience.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-13418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction
>            Reporter: Corentin Chary
>              Labels: twcs
>         Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If you really
want read-repairs you're going to have sstables blocking the expiration of other fully expired
SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a very low
value and that will purge the blockers of old data that should already have expired, thus
removing the overlaps and allowing the other SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have time series,
you might not care if all your data doesn't exactly expire at the right time, or if data re-appears
for some time, as long as it gets deleted as soon as it can. And in this situation I believe
it would be really beneficial to allow users to simply ignore overlapping SSTables when looking
for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, so this
isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be enough to
greatly reduce entropy of the most used data (and if you have timeseries, you're likely to
have a dashboard doing the same important queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on our system
and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message