cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Eriksson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8004) Run LCS for both repaired and unrepaired data
Date Tue, 07 Oct 2014 12:19:33 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161824#comment-14161824
] 

Marcus Eriksson commented on CASSANDRA-8004:
--------------------------------------------

pushed a branch here for this: https://github.com/krummas/cassandra/commit/476b27dc503c3541ee31dacdd70191fee8a819a5

* Introduces a "WrappingCompactionStrategy" that contains the logic for handling repaired/unrepaired
sstables.
** Could be a bit confusing and should probably be refactored for 3.0 - it would be nicer
with a "CompactionStrategyManager" or similar that does not extend AbstractCompactionStrategy,
but we currently call cfs.getCompactionStrategy() in many places so having the WCS makes it
transparent to any users.
* As mentioned in the description this makes it possible, for the first run, to move sstables
from the leveling in unrepaired straight over to the repaired-leveling. After the first run,
we try to move sstables over, if it fails, they are sent to L0.
* keeps 2 instances of the same compaction strategy, changing the compaction strategy is now
handled by WrappingCompactionStrategy.
* The compaction strategies now track which sstables they can run compaction on (LCS always
did this, now STCS does it as well). So the compaction strategy will only ever see either
repaired or unrepaired sstables.
* As mentioned in CASSANDRA-5351 (and the original reason we did STCS on the unrepaired data)
the write amplification gets a lot higher when having 2 parallel levelings, so maybe we should
have an option to configure the different compaction strategies separately - you could configure
STCS for the unrepaired and LCS for the repaired if the write amplification gets too high
for the use case.
* An added benefit of running LCS for the unrepaired data is that it makes each sstable contain
a smaller range - making it more likely that the sstable is fully contained within the repaired
range and the anticompaction step can simply update the repairedAt timestamp and not have
to rewrite the entire sstable to split out the repaired ranges.
* Also handles the case where someone runs incremental repair once, and then forgets about
it, then all the data would be size tiered in the current implementation, with this there
will be a small/old repaired leveling and a big unrepaired leveling.

Thoughts, comments?

> Run LCS for both repaired and unrepaired data
> ---------------------------------------------
>
>                 Key: CASSANDRA-8004
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8004
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Marcus Eriksson
>            Assignee: Marcus Eriksson
>              Labels: compaction
>             Fix For: 2.1.1
>
>
> If a user has leveled compaction configured, we should run that for both the unrepaired
and the repaired data. I think this would make things a lot easier for end users
> It would simplify migration to incremental repairs as well, if a user runs incremental
repair on its nice leveled unrepaired data, we wont need to drop it all to L0, instead we
can just start moving sstables from the unrepaired leveling straight into the repaired leveling
> Idea could be to have two instances of LeveledCompactionStrategy and move sstables between
the instances after an incremental repair run (and let LCS be totally oblivious to whether
it handles repaired or unrepaired data). Same should probably apply to any compaction strategy,
run two instances and remove all repaired/unrepaired logic from the strategy itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message