cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Eriksson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-3200) Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
Date Thu, 01 Jun 2017 12:29:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032897#comment-16032897
] 

Marcus Eriksson edited comment on CASSANDRA-3200 at 6/1/17 12:28 PM:
---------------------------------------------------------------------

Branch for this here:
https://github.com/krummas/cassandra/commits/marcuse/CASSANDRA-3200
dtests:
https://github.com/krummas/cassandra-dtest/commits/marcuse/mt_calcs

So, this does what is described in the description - if we repair 3 nodes, A, B, C, and B
has a range out of sync but A and C are equal, we only stream to B from either A or C.

It does this by introducing 'asymmetric syncing' - when we compare the merkle trees, we let
each node track its incoming streams, and whenever we add an incoming stream, we check if
we are already streaming the same data from another node. This might increase the number of
SyncRequest messages sent by repair coordinator since it only ever asks remote nodes to fetch
ranges from other nodes, never push out any (this could be optimised ofc, but I doubt it is
a problem).

It does not compare the leaves in the merkle trees, instead it denormalizes the ranges as
we add them, for example, say that node {{A}} has an incoming stream from {{B}} on {{(0, 100]}},
but then we add that {{A}} needs to stream {{(50, 100]}} from {{C}}, then the resulting incoming
streams to {{A}} would be {{(0, 50]}} from {{B}} and {{(50, 100]}} from *either* {{B}} or
{{C}} (assuming {{B}} and {{C}} are equal on the range {{(50, 100]}})

It tries to pick the least loaded node when we have the option to stream from several nodes,
with preference to same-dc nodes.

Old symmetric syncing can be run by passing {{-ss}} to nodetool repair


was (Author: krummas):
Branch for this here:
https://github.com/krummas/cassandra/commits/marcuse/CASSANDRA-3200
dtests:
https://github.com/krummas/cassandra-dtest/commits/marcuse/mt_calcs

So, this does what is described in the description - if we repair 3 nodes, A, B, C, and B
has a range out of sync but A and C are equal, we only stream to B from either A or C.

It does this by introducing 'asymmetric syncing' - when we compare the merkle trees, we let
each node track its incoming streams, and whenever we add an incoming stream, we check if
we are already streaming the same data from another node. This might increase the number of
SyncRequest messages sent by repair coordinator since it only ever asks remote nodes to fetch
ranges from other nodes, never push out any (this could be optimised ofc, but I doubt it is
a problem).

It does not compare the leaves in the merkle trees, instead it denormalizes the ranges as
we add them, for example, say that node {{A}} has an incoming stream from {{B}} on {{[0, 100)}},
but then we add that {{A}} needs to stream {{[50, 100)}} from {{C}}, then the resulting incoming
streams to {{A}} would be {{[0, 50)}} from {{B}} and {{[50, 100)}} from *either* {{B}} or
{{C}} (assuming {{B}} and {{C}} are equal on the range {{[50, 100)}})

It tries to pick the least loaded node when we have the option to stream from several nodes,
with preference to same-dc nodes.

Old symmetric syncing can be run by passing {{-ss}} to nodetool repair

> Repair: compare all trees together (for a given range/cf) instead of by pair in isolation
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3200
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3200
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Marcus Eriksson
>            Priority: Minor
>              Labels: repair
>             Fix For: 4.x
>
>
> Currently, repair compare merkle trees by pair, in isolation of any other tree. What
that means concretely is that if I have three node A, B and C (RF=3) with A and B in sync,
but C having some range r inconsitent with both A and B (since those are consistent), we will
do the following transfer of r: A -> C, C -> A, B -> C, C -> B.
> The fact that we do both A -> C and C -> A is fine, because we cannot know which
one is more to date from A or C. However, the transfer B -> C is useless provided we do
A -> C if A and B are in sync. Not doing that transfer will be a 25% improvement in that
case. With RF=5 and only one node inconsistent with all the others, that almost a 40% improvement,
etc...
> Given that this situation of one node not in sync while the others are is probably fairly
common (one node died so it is behind), this could be a fair improvement over what is transferred.
In the case where we use repair to rebuild completely a node, this will be a dramatic improvement,
because it will avoid the rebuilded node to get RF times the data it should get.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message