cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sankalp kohli (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10887) Pending range calculator gives wrong pending ranges for moves
Date Sat, 19 Dec 2015 00:30:46 GMT


sankalp kohli commented on CASSANDRA-10887:

I have added a patch along with many unit tests. Looks like the issue with moves is there
since the feature was added in CASSANDRA-1427. 
This patch fixes the following issues
1) It will add the correct endpoints to the pending range not just the endpoint which is moving.

2) It will remove the ranges which the endpoint is already getting. Eg: If an endpoint is
getting range 10-20 before the move and needs range say 15-20 after the move, it will only
add 15-20. 

I think 2) will solve the problem of duplicate endpoints in pending endpoint and natural endpoint.
Thought I have not tested this part. 

Most of the patch is unit tests which simulate many types of moves. We should see if there
is any case of move which is missing. 

PS: I have not tested this patch apart from the unit test :)

> Pending range calculator gives wrong pending ranges for moves
> -------------------------------------------------------------
>                 Key: CASSANDRA-10887
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>            Reporter: Richard Low
>            Assignee: sankalp kohli
>            Priority: Critical
>         Attachments: CASSANDRA-10887.diff
> My understanding is the PendingRangeCalculator is meant to calculate who should receive
extra writes during range movements. However, it adds the wrong ranges for moves. An extreme
example of this can be seen in the following reproduction. Create a 5 node cluster (I did
this on 2.0.16 and 2.2.4) and a keyspace RF=3 and a simple table. Then start moving a node
and immediately kill -9 it. Now you see a node as down and moving in the ring. Try a quorum
write for a partition that is stored on that node - it will fail with a timeout. Further,
all CAS reads or writes fail immediately with unavailable exception because they attempt to
include the moving node twice. This is likely to be the cause of CASSANDRA-10423.
> In my example I had this ring:
>  rack1       Up     Normal  170.97 KB       20.00%              -9223372036854775808
>  rack1       Up     Normal  124.06 KB       20.00%              -5534023222112865485
>  rack1       Down   Moving  108.7 KB        40.00%              1844674407370955160
>  rack1       Up     Normal  142.58 KB       0.00%               1844674407370955161
>  rack1       Up     Normal  118.64 KB       20.00%              5534023222112865484
> Node 3 was moving to -1844674407370955160. I added logging to print the pending and natural
endpoints. For ranges owned by node 3, node 3 appeared in pending and natural endpoints. The
blockFor is increased to 3 so we’re effectively doing CL.ALL operations. This manifests
as write timeouts and CAS unavailables when the node is down.
> The correct pending range for this scenario is node 1 is gaining the range (-1844674407370955160,
1844674407370955160). So node 1 should be added as a destination for writes and CAS for this
range, not node 3.

This message was sent by Atlassian JIRA

View raw message