cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jaakko Laine (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-603) pending range collision between nodes
Date Tue, 08 Dec 2009 13:29:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787452#action_12787452
] 

Jaakko Laine commented on CASSANDRA-603:
----------------------------------------

As for the fix, there are two (at least) two options I think:

(1) Add a list of pending primary ranges (or tokens) to token metadata. Currently primary
and replica pending ranges are all in one list, so there is no way to check afterwards if
primary ranges collide.

(2) Ditch pending ranges completely and convert it to pending tokens. Problem with pending
ranges is that it is static structure (determined at the time of bootstrap/leaving) and does
not react to token changes during the operation. This introduces a number of difficult-to-prove-that-it-works-correctly
and difficult-to-handle-correctly corner cases regarding node movement as proved by various
mail and JIRA discussions recently. If we had a list of pending tokens instead, it would adapt
to any changes that happen during the move operation. There are currently issues in pending
range handling (not cleaned up correctly in all cases, thread/atomicy issues, leaving coordination,
etc) that would mostly go away if we swiched to pending tokens instead, I think. Might be
that I'm overlooking something obvious here, but to me it seems like dynamically adapting
pending token list would be more suitable for this.


> pending range collision between nodes
> -------------------------------------
>
>                 Key: CASSANDRA-603
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-603
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Chris Goffinet
>             Fix For: 0.5
>
>
> We bootstrapped 5 nodes on the east coast from an existing cluster (5) on west. We waited
at least 60 seconds before starting up each node so it would start bootstrapping. We started
seeing these types of errors:
>  INFO [GMFD:1] 2009-12-04 01:45:42,065 Gossiper.java (line 568) Node /X.X.X.140 has now
joined.
> ERROR [GMFD:1] 2009-12-04 01:46:14,371 DebuggableThreadPoolExecutor.java (line 127) Error
in ThreadPoolExecutor
> java.lang.RuntimeException: pending range collision between /X.X.X.139 and /X.X.X.140
>         at org.apache.cassandra.locator.TokenMetadata.addPendingRange(TokenMetadata.java:242)
>         at org.apache.cassandra.service.StorageService.updateBootstrapRanges(StorageService.java:481)
>         at org.apache.cassandra.service.StorageService.onChange(StorageService.java:402)
>         at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:692)
>         at org.apache.cassandra.gms.Gossiper.applyApplicationStateLocally(Gossiper.java:657)
>         at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:610)
>         at org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(Gossiper.java:978)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:38)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> ERROR [GMFD:1] 2009-12-04 01:46:14,378 CassandraDaemon.java (line 71) Fatal exception
in thread Thread[GMFD:1,5,main]   
> java.lang.RuntimeException: pending range collision between /X.X.X.139 and /X.X.X.140
> java.lang.RuntimeException: pending range collision between /X.X.X.139 and /X.X.X.140
>         at org.apache.cassandra.locator.TokenMetadata.addPendingRange(TokenMetadata.java:242)
>         at org.apache.cassandra.service.StorageService.updateBootstrapRanges(StorageService.java:481)
>         at org.apache.cassandra.service.StorageService.onChange(StorageService.java:402)
>         at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:692)
>         at org.apache.cassandra.gms.Gossiper.applyApplicationStateLocally(Gossiper.java:657)
>         at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:610)
>         at org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(Gossiper.java:978)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:38)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message