cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mck (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10496) Make DTCS/TWCS split partitions based on time during compaction
Date Tue, 05 Sep 2017 02:28:03 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153049#comment-16153049
] 

mck commented on CASSANDRA-10496:
---------------------------------

[~iksaif],
a few comments:
 - i suspect [~krummas] is keen to see a patch that splits partitions. Even though a solution
that doesn't still has a lot to offer.
 - changing locations isn't supported. see how i paired it with the writer in my experiment
above.
 - i don't think you want to create the SSTableWriters multiple times.
 - Marcus' original idea was to create only two sstables per TWCS window. is that still possible?
 - shouldn't the bucket be based of the maxTimestamp? see `getBuckets(..)` and `newestBucket(..)`
 - is it correct that the idea is as "old" sstables are split out they would later then get
re-compacted with their original bucket, and the domino effect that this could cause re-compacting
older buckets could be avoided by increasing minThreshold to 3?


> Make DTCS/TWCS split partitions based on time during compaction
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-10496
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10496
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>              Labels: dtcs
>             Fix For: 4.x
>
>
> To avoid getting old data in new time windows with DTCS (or related, like [TWCS|CASSANDRA-9666]),
we need to split out old data into its own sstable during compaction.
> My initial idea is to just create two sstables, when we create the compaction task we
state the start and end times for the window, and any data older than the window will be put
in its own sstable.
> By creating a single sstable with old data, we will incrementally get the windows correct
- say we have an sstable with these timestamps:
> {{[100, 99, 98, 97, 75, 50, 10]}}
> and we are compacting in window {{[100, 80]}} - we would create two sstables:
> {{[100, 99, 98, 97]}}, {{[75, 50, 10]}}, and the first window is now 'correct'. The next
compaction would compact in window {{[80, 60]}} and create sstables {{[75]}}, {{[50, 10]}}
etc.
> We will probably also want to base the windows on the newest data in the sstables so
that we actually have older data than the window.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message