cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Owens (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
Date Mon, 05 Jun 2017 18:51:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037380#comment-16037380
] 

Jonathan Owens commented on CASSANDRA-13418:
--------------------------------------------

We're chasing what may be a gotcha in our implementation of this. We have one cluster that
does regular incremental repairs, and is ending up with a whole lot of duplicated data across
sstables, we guess due to overstreaming. Explicitly ignoring overlap is awesome for compacting
away tombstones, but does nothing to detect duplicate partitions across tables on disk. And
in TWCS, because it uses largest-timestamp to bucket, tables with older data in them that
was streamed later will never appear in the same compaction operation as the table they "should
have" been written in the first time. CASSANDRA-10496 would resolve this eventually by pushing
that older data into the correct bucket, but we need a workaround sooner.

We're contemplating a few options:
* I remember, or imagined, a ticket to try to suss out overlapping sstables and include them
in the current compaction operation if found, rather than cancelling the operation. That seems
good here, because in TWCS you should not have many overlaps, and if you do they need to be
addressed somehow or you end up with duplicates.
* We could switch to cassandra-reaper or something similar and do higher-precision repairs
to reduce overstreaming, though that's a lot of work to fix what seems really like a compaction
artifact.
* Reverting the change would put us back in the world where tombstones don't expire due to
overlap checks failing, so that's out.
* We can write an external tool to detect overlaps and issue user-defined compactions against
them, but that seems really yucky. 
* We could never run incremental repairs and rely only on higher consistency levels on write/read,
and let read repair do the work. This fixes the problem only by decreasing the magnitude.

I still believe this patch is a good idea, as optimizing for tombstone expiry is essential
with TWCS, but the repair interaction here is worth pointing out.


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-13418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction
>            Reporter: Corentin Chary
>              Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If you really
want read-repairs you're going to have sstables blocking the expiration of other fully expired
SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a very low
value and that will purge the blockers of old data that should already have expired, thus
removing the overlaps and allowing the other SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have time series,
you might not care if all your data doesn't exactly expire at the right time, or if data re-appears
for some time, as long as it gets deleted as soon as it can. And in this situation I believe
it would be really beneficial to allow users to simply ignore overlapping SSTables when looking
for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, so this
isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be enough to
greatly reduce entropy of the most used data (and if you have timeseries, you're likely to
have a dashboard doing the same important queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on our system
and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message