cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris mildebrandt (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-8274) Node fails to rejoin cluster on EC2 if private IP is changed
Date Sat, 23 Sep 2017 19:22:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16177493#comment-16177493
] 

Chris mildebrandt edited comment on CASSANDRA-8274 at 9/23/17 7:21 PM:
-----------------------------------------------------------------------

I just hit this issue today with the 3.11.0 docker image running in kubernetes. I had 4 nodes
in the cassandra cluster, two members were restarted (and their IPs changed) and can't rejoin.
There's one seed that is up and reachable from all the other containers, and one other member
that is able to join. The first exception I see is this:

{noformat}
java.lang.RuntimeException: Cache schema version 38e97a53-563b-3074-b86f-c81efa980524 does
not match current schema version 1bfdabae-743e-357e-a661-93984c26bc32
        at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:206)
~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:164) [apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:160) [apache-cassandra-3.11.0.jar:3.11.0]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
{noformat}


Then I see the one related to this issue:

{noformat}
java.lang.RuntimeException: Unable to gossip with any seeds
        at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1413) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:550)
~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:801)
~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:666)
~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612)
~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:393) [apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600)
[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689) [apache-cassandra-3.11.0.jar:3.11.0]
{noformat}


Restarting the nodes didn't help. nodetool status is now reporting only two nodes, and nodetool
gossipinfo has three "empty" entries (along with the two good ones not shown):


{noformat}
/100.96.3.164
  generation:0
  heartbeat:0
  TOKENS: not present
/100.96.1.7
  generation:0
  heartbeat:0
  TOKENS: not present
/100.96.2.170
  generation:0
  heartbeat:0
  TOKENS: not present
{noformat}




was (Author: mildebrandt):
I just hit this issue today with the 3.11.0 docker image running in kubernetes. I had 4 nodes
in the cassandra cluster, two members were restarted (and their IPs changed) and can't rejoin.
There's one seed that is up and reachable from all the other containers, and one other member
that is able to join. The first exception I see is this:

{noformat}
java.lang.RuntimeException: Cache schema version 38e97a53-563b-3074-b86f-c81efa980524 does
not match current schema version 1bfdabae-743e-357e-a661-93984c26bc32
        at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:206)
~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:164) [apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:160) [apache-cassandra-3.11.0.jar:3.11.0]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
{noformat}


Then I see the one related to this issue:

{noformat}
java.lang.RuntimeException: Unable to gossip with any seeds
        at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1413) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:550)
~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:801)
~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:666)
~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612)
~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:393) [apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600)
[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689) [apache-cassandra-3.11.0.jar:3.11.0]
{noformat}


Restarting the nodes didn't help. nodetool status is now reporting only two nodes, and nodetool
gossipinfo has three "empty" entries:


{noformat}
/100.96.3.164
  generation:0
  heartbeat:0
  TOKENS: not present
/100.96.1.7
  generation:0
  heartbeat:0
  TOKENS: not present
/100.96.2.170
  generation:0
  heartbeat:0
  TOKENS: not present
{noformat}



> Node fails to rejoin cluster on EC2 if private IP is changed
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-8274
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8274
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata
>         Environment: Amazon EC2
>            Reporter: Joseph Clark
>            Priority: Minor
>             Fix For: 3.11.x
>
>
> Nodes in Amazon AWS EC2 Classic (not a VPC) may be assigned a new private IP if the node
is stopped and then started again. In this case we have puppet update the configured listen_address
to the new private IP. However, once the cassandra service starts, it is unable to communicate
with the existing nodes(single region) and vice versa.
> 'nodetool status' shows that each node believes that it is 'UN' and the other node is
'DN'.
> 'nodetool gossipinfo' on the node that remained running shows the *old* private IP listed
as the 'INTERNAL_IP' of the node that was stopped and restarted. 
> The situation is resolved by restarting the cassandra service on the node that remained
running. Once it has restarted, the INTERNAL_IP is correctly updated to the new private IP.
'nodetool status' shows that both nodes are up and the cluster appears to function normally.
> This appears to me to be the root cause of https://issues.apache.org/jira/browse/CASSANDRA-7292.
-Possibly https://issues.apache.org/jira/browse/CASSANDRA-8072 as well, but I am not convinced
they are actually duplicates.-



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message