cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Blake Eggleston (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
Date Wed, 30 Nov 2016 17:55:58 GMT


Blake Eggleston commented on CASSANDRA-9143:

bq. Should we prioritize the pending-repair-cleanup compactions?

Makes sense.

bq. Is there any point in doing anticompaction after repair with -full repairs? Can we always
do consistent repairs? We would need to anticompact already repaired sstables into pending,
but that should not be a big problem?

Good point. I'd say we should keep full repairs simple. Don't do anti-compaction on them,
and don't make them consistent. Given the newness and relative complexity of consistent repair,
it would be smart to have a full workaround in case we find a problem with it. If we're not
going to do anti-compaction though, we should preserve repairedAt values of the sstables we're
streaming around as part of a full repair. That will make is possible to fix corrupted or
lost data in the repair buckets without adversely affecting the next incremental repair.

bq. In handleStatusRequest - if we don't have the local session, we should probably return
that the session is failed?

That makes sense

> Improving consistency of repairAt field across replicas 
> --------------------------------------------------------
>                 Key: CASSANDRA-9143
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Blake Eggleston
> We currently send an anticompaction request to all replicas. During this, a node will
split stables and mark the appropriate ones repaired. 
> The problem is that this could fail on some replicas due to many reasons leading to problems
in the next repair. 
> This is what I am suggesting to improve it. 
> 1) Send anticompaction request to all replicas. This can be done at session level. 
> 2) During anticompaction, stables are split but not marked repaired. 
> 3) When we get positive ack from all replicas, coordinator will send another message
called markRepaired. 
> 4) On getting this message, replicas will mark the appropriate stables as repaired. 
> This will reduce the window of failure. We can also think of "hinting" markRepaired message
if required. 
> Also the stables which are streaming can be marked as repaired like it is done now. 

This message was sent by Atlassian JIRA

View raw message