cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Eriksson (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
Date Tue, 29 Nov 2016 13:14:59 GMT


Marcus Eriksson commented on CASSANDRA-9143:

Looks good in general - comments;

* Rename the cleanup compaction task, very confusing wrt the current cleanup compactions
* Should we prioritize the pending-repair-cleanup compactions?
** If we don't we might compare different datasets - a repair fails half way through and one
node happens to move the pending data to unrepaired, operator retriggers repair and we would
compare different datasets. If we instead move the data back as quickly as possible we minimize
this window
** It would also help the next normal compactions as we might be able to include more sstables
in the repaired/unrepaired strategies
* Is there any point in doing anticompaction after repair with -full repairs? Can we always
do consistent repairs? We would need to anticompact already repaired sstables into pending,
but that should not be a big problem?
* In CompactionManager#getSSTablesToValidate we still mark all unrepaired sstables as repairing
- we don't need to do that for consistent repairs. And if we can do consistent repair for
-full as well, all that code can be removed
* In handleStatusRequest - if we don't have the local session, we should probably return that
the session is failed?
* Fixed some minor nits here:

> Improving consistency of repairAt field across replicas 
> --------------------------------------------------------
>                 Key: CASSANDRA-9143
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Blake Eggleston
> We currently send an anticompaction request to all replicas. During this, a node will
split stables and mark the appropriate ones repaired. 
> The problem is that this could fail on some replicas due to many reasons leading to problems
in the next repair. 
> This is what I am suggesting to improve it. 
> 1) Send anticompaction request to all replicas. This can be done at session level. 
> 2) During anticompaction, stables are split but not marked repaired. 
> 3) When we get positive ack from all replicas, coordinator will send another message
called markRepaired. 
> 4) On getting this message, replicas will mark the appropriate stables as repaired. 
> This will reduce the window of failure. We can also think of "hinting" markRepaired message
if required. 
> Also the stables which are streaming can be marked as repaired like it is done now. 

This message was sent by Atlassian JIRA

View raw message