cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Low (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-10887) Pending range calculator gives wrong pending ranges for moves
Date Thu, 17 Dec 2015 04:27:46 GMT
Richard Low created CASSANDRA-10887:

             Summary: Pending range calculator gives wrong pending ranges for moves
                 Key: CASSANDRA-10887
             Project: Cassandra
          Issue Type: Bug
          Components: Coordination
            Reporter: Richard Low
            Priority: Critical

My understanding is the PendingRangeCalculator is meant to calculate who should receive extra
writes during range movements. However, it adds the wrong ranges for moves. An extreme example
of this can be seen in the following reproduction. Create a 5 node cluster (I did this on
2.0.16 and 2.2.4) and a keyspace RF=3 and a simple table. Then start moving a node and immediately
kill -9 it. Now you see a node as down and moving in the ring. Try a quorum write for a partition
that is stored on that node - it will fail with a timeout. Further, all CAS reads or writes
fail immediately with unavailable exception because they attempt to include the moving node
twice. This is likely to be the cause of CASSANDRA-10423.

In my example I had this ring:  rack1       Up     Normal  170.97 KB       20.00%              -9223372036854775808  rack1       Up     Normal  124.06 KB       20.00%              -5534023222112865485  rack1       Down   Moving  108.7 KB        40.00%              1844674407370955160  rack1       Up     Normal  142.58 KB       0.00%               1844674407370955161  rack1       Up     Normal  118.64 KB       20.00%              5534023222112865484

Node 3 was moving to -1844674407370955160. I added logging to print the pending and natural
endpoints. For ranges owned by node 3, node 3 appeared in pending and natural endpoints. The
blockFor is increased to 3 so we’re effectively doing CL.ALL operations. This manifests
as write timeouts and CAS unavailables when the node is down.

The correct pending range for this scenario is node 1 is gaining the range (-1844674407370955160,
1844674407370955160). So node 1 should be added as a destination for writes and CAS for this
range, not node 3.

This message was sent by Atlassian JIRA

View raw message