cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom van der Woerdt (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-13348) Duplicate tokens after bootstrap
Date Mon, 27 Mar 2017 21:31:41 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944078#comment-15944078
] 

Tom van der Woerdt edited comment on CASSANDRA-13348 at 3/27/17 9:31 PM:
-------------------------------------------------------------------------

Murmur3 simulator, yes. GossipingPropertyFileSnitch, if it matters.

I don't recall exactly how this cluster was built, but it was something like this :
 * Provision 5 nodes per DC, all but one with "-Dcassandra.join_ring=false". Keyspace with
rf= dc1:2 dc2:2
 * "nodetool join" one at a time (random order)
 * Provision 30 nodes in dc1 -- all have "allocate_tokens_for_keyspace" set
 * "nodetool join" ~10
 * Decommission the first five, so we're now left with dc1:10
 * "nodetool join" the rest
 * Ditto for dc2, so we now have dc1:30 dc2:30

There's a lot of automation involved, a human may take a different route to doing this. I
decommissioned the 10 initial nodes which had non-ideal hardware, and they made place for
60 more powerful machines.

The "nodetool join" batches to join the final 20 in the DC caused the bad tokens.


was (Author: tvdw):
Murmur3 simulator, yes. GossipingPropertyFileSnitch, if it matters.

I don't recall exactly how this cluster was built, but it was something like this :
 * Provision 5 nodes per DC, all but one with "-Dcassandra.join_ring=false". Keyspace with
rf= dc1:2 dc2:2
 * "nodetool join" one at a time (random order)
 * Provision 30 nodes in dc1 -- all have "allocate_tokens_for_keyspace" set
 * "nodetool join" ~10
 * Decommission the first five, so we're now left with dc1:10
 * "nodetool join" the rest
 * Ditto for dc2, so we now have dc1:30 dc2:30

There's a lot of automation involved, a human may take a different route to doing this. I
decommissioned the 10 initial nodes which had non-ideal hardware, and they made place for
60 more powerful machines.

The last "nodetool join" batch produced two or three machines with bad tokens.

> Duplicate tokens after bootstrap
> --------------------------------
>
>                 Key: CASSANDRA-13348
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13348
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Tom van der Woerdt
>            Priority: Blocker
>             Fix For: 3.0.x
>
>
> This one is a bit scary, and probably results in data loss. After a bootstrap of a few
new nodes into an existing cluster, two new nodes have chosen some overlapping tokens.
> In fact, of the 256 tokens chosen, 51 tokens were already in use on the other node.
> Node 1 log :
> {noformat}
> INFO  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,461 StorageService.java:1160
- JOINING: waiting for ring information
> INFO  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,461 StorageService.java:1160
- JOINING: waiting for schema information to complete
> INFO  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,461 StorageService.java:1160
- JOINING: schema complete, ready to bootstrap
> INFO  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,462 StorageService.java:1160
- JOINING: waiting for pending range calculation
> INFO  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,462 StorageService.java:1160
- JOINING: calculation complete, ready to bootstrap
> INFO  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,462 StorageService.java:1160
- JOINING: getting bootstrap token
> WARN  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,564 TokenAllocation.java:61
- Selected tokens [............, 2959334889475814712, 3727103702384420083, 7183119311535804926,
6013900799616279548, -1222135324851761575, 1645259890258332163, -1213352346686661387, 7604192574911909354]
> WARN  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,729 TokenAllocation.java:65
- Replicated node load in datacentre before allocation max 1.00 min 1.00 stddev 0.0000
> WARN  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,729 TokenAllocation.java:66
- Replicated node load in datacentre after allocation max 1.00 min 1.00 stddev 0.0000
> WARN  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,729 TokenAllocation.java:70
- Unexpected growth in standard deviation after allocation.
> INFO  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:44,150 StorageService.java:1160
- JOINING: sleeping 30000 ms for pending range setup
> INFO  [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:43:14,151 StorageService.java:1160
- JOINING: Starting to bootstrap...
> {noformat}
> Node 2 log:
> {noformat}
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:51,937 StorageService.java:971
- Joining ring by operator request
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,513 StorageService.java:1160
- JOINING: waiting for ring information
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,513 StorageService.java:1160
- JOINING: waiting for schema information to complete
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,513 StorageService.java:1160
- JOINING: schema complete, ready to bootstrap
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,513 StorageService.java:1160
- JOINING: waiting for pending range calculation
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,514 StorageService.java:1160
- JOINING: calculation complete, ready to bootstrap
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,514 StorageService.java:1160
- JOINING: getting bootstrap token
> WARN  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,630 TokenAllocation.java:61
- Selected tokens [......, 2890709530010722764, -2416006722819773829, -5820248611267569511,
-5990139574852472056, 1645259890258332163, 9135021011763659240, -5451286144622276797, 7604192574911909354]
> WARN  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,794 TokenAllocation.java:65
- Replicated node load in datacentre before allocation max 1.02 min 0.98 stddev 0.0000
> WARN  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,795 TokenAllocation.java:66
- Replicated node load in datacentre after allocation max 1.00 min 1.00 stddev 0.0000
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:53,149 StorageService.java:1160
- JOINING: sleeping 30000 ms for pending range setup
> INFO  [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:56:23,149 StorageService.java:1160
- JOINING: Starting to bootstrap...
> {noformat}
> eg. 7604192574911909354 has been chosen by both.
> The joins were eight days apart, so I don't think it's a race :)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message