cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Blake Eggleston (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
Date Thu, 12 Oct 2017 22:49:00 GMT


Blake Eggleston commented on CASSANDRA-3200:

bq. yeah I agree it duplicates a lot of code, but they also do different things - the asymmetric
ones don't need the merkle trees for example since we compare everything outside of this class
now. Let me know if you see a straight-forward way to do it. I'll try to break out the common
code in a separate class. Hopefully the non-symmetric classes can be removed once we have
confidence this works as well.

Good point, I wasn’t paying attention to the stuff going on in their respective base classes

bq. indentation looks good to me and they look good on github or am I misunderstanding you?

The formatting of the matrices looks good, they just look weird starting at column 0 when
the rest of the method / comment is indented 8 spaces. iow, something like this:

      ... something ...

        A   B   C   D   E
      A     =   x   x   x
      B         x   x   x
      C             x   x
      D                 =

Second round of review:

Everything looks good for the most part, and your optimization / stream reduction stuff makes
sense. There are just a few minor things:

* {{hasDifferencesFor}} isEmpty check is uneccesary

* Probably don’t need this class, ImmutableMap<InetAddress, HostDifferences> should
be fine

* default for optimizeStreams seems to be false, but javadoc says it’s true

* uncomment or remove logger info statement at line 95

* startTime is compared to Long.MIN_VALUE in {{finished}}, but it never initialized to that
value. Unless I’m mistaken, long values that aren’t explicitly initialized to some value
become 0 by default, so that branch in finished will always run, even if {{run}} wasn’t

> Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
> -----------------------------------------------------------------------------------------
>                 Key: CASSANDRA-3200
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Marcus Eriksson
>            Priority: Minor
>              Labels: repair
>             Fix For: 4.x
> Currently, repair compare merkle trees by pair, in isolation of any other tree. What
that means concretely is that if I have three node A, B and C (RF=3) with A and B in sync,
but C having some range r inconsitent with both A and B (since those are consistent), we will
do the following transfer of r: A -> C, C -> A, B -> C, C -> B.
> The fact that we do both A -> C and C -> A is fine, because we cannot know which
one is more to date from A or C. However, the transfer B -> C is useless provided we do
A -> C if A and B are in sync. Not doing that transfer will be a 25% improvement in that
case. With RF=5 and only one node inconsistent with all the others, that almost a 40% improvement,
> Given that this situation of one node not in sync while the others are is probably fairly
common (one node died so it is behind), this could be a fair improvement over what is transferred.
In the case where we use repair to rebuild completely a node, this will be a dramatic improvement,
because it will avoid the rebuilded node to get RF times the data it should get.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message