cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Jirsa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9143) Improving consistency of repairAt field across replicas
Date Thu, 19 Jan 2017 00:05:26 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829017#comment-15829017
] 

Jeff Jirsa commented on CASSANDRA-9143:
---------------------------------------

I know [~krummas] has review (and he's certainly the most qualified and most likely to give
the best feedback here), but a I took a few notes as I read through the patches:

- ARS {{registerParentRepairSession}} assert is inverted
- https://github.com/bdeggleston/cassandra/commit/8e7eb081625b1749716f60bcb109ade8c84d8558#diff-93e6fa14f908d0ce3c24d56fbf484ba3R88
- double-checked locking needs volatile
- Comment in CompactionStrategyManager about 2 strategies per data dir is now misleading,
if not incorrect (also have pendingRepairs, which may be multiple other strategies?): https://github.com/bdeggleston/cassandra/blob/9143-trunk/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java#L59
- {{ConsistentSession.AbstractBuilder}} doesn't need to be public (especially with other {{AbstractBuilder}}s
in the codebase)
- {{LocalSessions.start()}} loads rows creating builders that should always work, but we've
seen in the past (like CASSANDRA-12700) that we shouldn't rely on all of those values to be
correct - maybe you can try to explicitly handle invalid rows being returned? If incomplete
row/session, maybe that's failed by definition?
- If you remove the enum definition for anticompaction request ( https://github.com/bdeggleston/cassandra/commit/76ee1a667818c5c72aa513c4a75777b1400cb69d#diff-9a5c76380064186d8f89003e1bab73bfL46
), and we're in a mixed cluster for some reason, and get that verb on the wire, we'll throw
- perhaps instead we should probably keep that verb, but log+ignore it if received. 

Less Importantly: 
- https://github.com/bdeggleston/cassandra/commit/8e7eb081625b1749716f60bcb109ade8c84d8558#diff-93e6fa14f908d0ce3c24d56fbf484ba3R303
- {{needsCleanup()}} could be renamed to avoid potentially confusing {{CompactionManager.needsCleanup()}},
especially since {{PendingRepairManager}} is very very much like {{CompactionManager}} (seems
as if you may have started this in https://github.com/bdeggleston/cassandra/commit/ade7fe3373a1b44da02caaefc3180503d298e92b
)
- https://github.com/bdeggleston/cassandra/blob/9143-trunk/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L605
- log from address as well, otherwise the log message is much less useful?
- https://github.com/bdeggleston/cassandra/blob/9143-trunk/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L619
- could pass address for better logging here as well (and pretty much all of these, {{handlePrepareMessage}},
{{handleFinalizeCommitMessage}}, etc)



> Improving consistency of repairAt field across replicas 
> --------------------------------------------------------
>
>                 Key: CASSANDRA-9143
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9143
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Blake Eggleston
>
> We currently send an anticompaction request to all replicas. During this, a node will
split stables and mark the appropriate ones repaired. 
> The problem is that this could fail on some replicas due to many reasons leading to problems
in the next repair. 
> This is what I am suggesting to improve it. 
> 1) Send anticompaction request to all replicas. This can be done at session level. 
> 2) During anticompaction, stables are split but not marked repaired. 
> 3) When we get positive ack from all replicas, coordinator will send another message
called markRepaired. 
> 4) On getting this message, replicas will mark the appropriate stables as repaired. 
> This will reduce the window of failure. We can also think of "hinting" markRepaired message
if required. 
> Also the stables which are streaming can be marked as repaired like it is done now. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message