cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Jirsa (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
Date Thu, 19 Jan 2017 00:05:26 GMT


Jeff Jirsa commented on CASSANDRA-9143:

I know [~krummas] has review (and he's certainly the most qualified and most likely to give
the best feedback here), but a I took a few notes as I read through the patches:

- ARS {{registerParentRepairSession}} assert is inverted
- double-checked locking needs volatile
- Comment in CompactionStrategyManager about 2 strategies per data dir is now misleading,
if not incorrect (also have pendingRepairs, which may be multiple other strategies?):
- {{ConsistentSession.AbstractBuilder}} doesn't need to be public (especially with other {{AbstractBuilder}}s
in the codebase)
- {{LocalSessions.start()}} loads rows creating builders that should always work, but we've
seen in the past (like CASSANDRA-12700) that we shouldn't rely on all of those values to be
correct - maybe you can try to explicitly handle invalid rows being returned? If incomplete
row/session, maybe that's failed by definition?
- If you remove the enum definition for anticompaction request (
), and we're in a mixed cluster for some reason, and get that verb on the wire, we'll throw
- perhaps instead we should probably keep that verb, but log+ignore it if received. 

Less Importantly: 
- {{needsCleanup()}} could be renamed to avoid potentially confusing {{CompactionManager.needsCleanup()}},
especially since {{PendingRepairManager}} is very very much like {{CompactionManager}} (seems
as if you may have started this in
- log from address as well, otherwise the log message is much less useful?
- could pass address for better logging here as well (and pretty much all of these, {{handlePrepareMessage}},
{{handleFinalizeCommitMessage}}, etc)

> Improving consistency of repairAt field across replicas 
> --------------------------------------------------------
>                 Key: CASSANDRA-9143
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Blake Eggleston
> We currently send an anticompaction request to all replicas. During this, a node will
split stables and mark the appropriate ones repaired. 
> The problem is that this could fail on some replicas due to many reasons leading to problems
in the next repair. 
> This is what I am suggesting to improve it. 
> 1) Send anticompaction request to all replicas. This can be done at session level. 
> 2) During anticompaction, stables are split but not marked repaired. 
> 3) When we get positive ack from all replicas, coordinator will send another message
called markRepaired. 
> 4) On getting this message, replicas will mark the appropriate stables as repaired. 
> This will reduce the window of failure. We can also think of "hinting" markRepaired message
if required. 
> Also the stables which are streaming can be marked as repaired like it is done now. 

This message was sent by Atlassian JIRA

View raw message