cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jaakko Laine (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-603) pending range collision between nodes
Date Tue, 15 Dec 2009 05:57:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790575#action_12790575
] 

Jaakko Laine commented on CASSANDRA-603:
----------------------------------------

Unfortunately it doesn't quite work that way :)

First the case of leaving nodes:

Problem with current implementation is that pending ranges is calculated only once at the
time of leaving. Suppose there is a ring of nodes A, B, C, D and E with replication factor
2. Ring status is this:

(primary, replica)
E-A, D-E
A-B, E-A
B-C, A-B
C-D, B-C
D-E, C-D

Suppose C prepares to leave. After hearing STATE_LEAVING from C, ring status will be:

(primary, replica, pending)
E-A, D-E
A-B, E-A
B-C, A-B
C-D, B-C, A-B
D-E, C-D, B-C

Now suppose also B leaves. After receiving STATE_LEAVING, ring status with current implementation
will be:
E-A, D-E
A-B, E-A
B-C, A-B, E-A
C-D, B-C, A-B
D-E, C-D, B-C

This is clearly wrong, as (1) E-A is being streamed to C, even though it is leaving and (2)
D is not getting this range, even if it is supposed to.

In order to do this right, we will need to know at all times what nodes are leaving and calculate
ranges accordingly. An anonymous pending ranges list is not enough, as that does not tell
which node is leaving and/or if the ranges are there because of bootstrap or leave operation.


As for bootstrapping and pending range collision:

Suppose that there is a ring of nodes A, C and E, with replication factor 3. Node D bootstraps
between C and E, so its pending ranges will be E-A, A-C and C-D. Now suppose node B bootstraps
between A and C at the same time. Its pending ranges would be C-E, E-A and A-B. Now both nodes
have pending range E-A in their list, which will cause pending range collision even though
we're only talking about replica range, not even primary range. The same thing happens for
any nodes that boot simultaneously between same two nodes. For this we cannot simply make
pending ranges a multimap, since that would make us unable to notice the real problem of two
nodes trying to boot using the same token. In order to do this properly, we need to know what
tokens are booting at any time.


> pending range collision between nodes
> -------------------------------------
>
>                 Key: CASSANDRA-603
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-603
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Chris Goffinet
>             Fix For: 0.5
>
>         Attachments: 603.patch
>
>
> We bootstrapped 5 nodes on the east coast from an existing cluster (5) on west. We waited
at least 60 seconds before starting up each node so it would start bootstrapping. We started
seeing these types of errors:
>  INFO [GMFD:1] 2009-12-04 01:45:42,065 Gossiper.java (line 568) Node /X.X.X.140 has now
joined.
> ERROR [GMFD:1] 2009-12-04 01:46:14,371 DebuggableThreadPoolExecutor.java (line 127) Error
in ThreadPoolExecutor
> java.lang.RuntimeException: pending range collision between /X.X.X.139 and /X.X.X.140
>         at org.apache.cassandra.locator.TokenMetadata.addPendingRange(TokenMetadata.java:242)
>         at org.apache.cassandra.service.StorageService.updateBootstrapRanges(StorageService.java:481)
>         at org.apache.cassandra.service.StorageService.onChange(StorageService.java:402)
>         at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:692)
>         at org.apache.cassandra.gms.Gossiper.applyApplicationStateLocally(Gossiper.java:657)
>         at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:610)
>         at org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(Gossiper.java:978)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:38)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> ERROR [GMFD:1] 2009-12-04 01:46:14,378 CassandraDaemon.java (line 71) Fatal exception
in thread Thread[GMFD:1,5,main]   
> java.lang.RuntimeException: pending range collision between /X.X.X.139 and /X.X.X.140
> java.lang.RuntimeException: pending range collision between /X.X.X.139 and /X.X.X.140
>         at org.apache.cassandra.locator.TokenMetadata.addPendingRange(TokenMetadata.java:242)
>         at org.apache.cassandra.service.StorageService.updateBootstrapRanges(StorageService.java:481)
>         at org.apache.cassandra.service.StorageService.onChange(StorageService.java:402)
>         at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:692)
>         at org.apache.cassandra.gms.Gossiper.applyApplicationStateLocally(Gossiper.java:657)
>         at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:610)
>         at org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(Gossiper.java:978)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:38)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message