cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Eriksson (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
Date Thu, 05 Oct 2017 12:40:00 GMT


Marcus Eriksson commented on CASSANDRA-3200:

haha yeah that was horribly unreadable coming back to it, sorry about that, refactored version
up [here|], hopefully it
makes more sense now.

bq. User facing:
bq. symmetric/asymmetric nodetool naming option is ambiguous, not sure what a better name
would be, maybe something about reducing or optimizing streams?
made it {{--optimise-streams}} (or {{-os}})
bq. should be off by default
bq. AsymmetricSyncRequest/SyncTasks:
bq. Could we just add a one-way flag to the existing requests / tasks? The new asymmetric
classes duplicate most of the symmetric tasks code (I think). In the case of local sync task,
the pullRepair flag is basically doing this already.
yeah I agree it duplicates a lot of code, but they also do different things - the asymmetric
ones don't need the merkle trees for example since we compare everything outside of this class
now. Let me know if you see a straight-forward way to do it.  I'll try to break out the common
code in a separate class. Hopefully the non-symmetric classes can be removed once we have
confidence this works as well.
bq. IncomingRepairStreamTracker
bq. fixing the container thing as mentioned above may fix this, but it’s difficult to figure
out how this works. A top level java doc explaining how the duplicate streams are identified
and reduced would be nice.
hopefully the refactor makes this clearer, the class is now tiny and only actually tracks
the incoming streams for each node
bq. The class name doesn’t seem appropriate. Not all the streams are incoming, and it’s
not tracking any continuous processes. Maybe RepairStreamReducer or RepairStreamOptimizer?
Moved the reduce logic to {{ReduceHelper}}
bq. Should be in the repair package.
bq. IncomingRepairStreamTrackerTest
bq. Should throw exception instead of printing stack trace in static block
bq. Fix indentation of matrices in test comments
indentation looks good to me and they look good on [github|]
or am I misunderstanding you?
bq. The content of the `differences` map, as set up in testSimpleReducing doesn’t make sense
to me, why would node C be in node A’s map, but note vice versa?
this is the way we compare trees - we compare A to C, but not the other way, and it makes
no sense adding the C -> A difference since we don't need it for the calculations
bq. I think it would be clearer to alias the contents of addresses 0-4 to static variables
like A, B, C, etc. Parsing out the array indices when reading through the tests is difficult
to follow.

> Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
> -----------------------------------------------------------------------------------------
>                 Key: CASSANDRA-3200
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Marcus Eriksson
>            Priority: Minor
>              Labels: repair
>             Fix For: 4.x
> Currently, repair compare merkle trees by pair, in isolation of any other tree. What
that means concretely is that if I have three node A, B and C (RF=3) with A and B in sync,
but C having some range r inconsitent with both A and B (since those are consistent), we will
do the following transfer of r: A -> C, C -> A, B -> C, C -> B.
> The fact that we do both A -> C and C -> A is fine, because we cannot know which
one is more to date from A or C. However, the transfer B -> C is useless provided we do
A -> C if A and B are in sync. Not doing that transfer will be a 25% improvement in that
case. With RF=5 and only one node inconsistent with all the others, that almost a 40% improvement,
> Given that this situation of one node not in sync while the others are is probably fairly
common (one node died so it is behind), this could be a fair improvement over what is transferred.
In the case where we use repair to rebuild completely a node, this will be a dramatic improvement,
because it will avoid the rebuilded node to get RF times the data it should get.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message