cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Motta <pauloricard...@gmail.com>
Subject Re: Node after restart sees other nodes down for 10 minutes
Date Wed, 27 Jul 2016 21:01:50 GMT
This looks somewhat related to CASSANDRA-9630. What is the C* version?

Can you check with netstats if other nodes keep connections with the
stopped node in the CLOSE_WAIT state? And also if the problem disappears if
you run nodetool disablegossip before stopping the node?

2016-07-26 16:54 GMT-03:00 Farzad Panahi <farzad.panahi@gmail.com>:

> I am new to Cassandra and trying to figure out how the cluster behaves
> when things go south.
>
> I have a 6-node cluster, RF=3.
>
> I stop Cassandra service on a node for a while. All nodes see the node as
> DN. After a while I start the Cassandra service on DN. Interesting point is
> that all other nodes see the node now as UN but the node itself sees 4
> nodes as DN and only one node as UN. After about 10 minutes the node sees
> other nodes as up as well.
>
> I am trying to figure out where this delay is coming from.
>
> I have attached part of system.log that looks interesting. Looks like
> after Gossiper logs InetAddress <ip> is now UP the node is actually seeing
> that node as up even though the node has already handshaked with that node
> before.
>
> Any ideas?
>
> Cheers
>
> Farzad
>
> --------------------------------------
> INFO  [main] 2016-07-25 21:58:46,044 StorageService.java:533 - Cassandra
> version: 3.0.8
> INFO  [main] 2016-07-25 21:58:46,098 StorageService.java:534 - Thrift API
> version: 20.1.0
> INFO  [main] 2016-07-25 21:58:46,150 StorageService.java:535 - CQL
> supported versions: 3.4.0 (default: 3.4.0)
> INFO  [main] 2016-07-25 21:58:46,284 IndexSummaryManager.java:85 -
> Initializing index summary manager with a memory pool size of 198 MB and a
> resize interval of 60 minutes
> INFO  [main] 2016-07-25 21:58:46,343 StorageService.java:554 - Loading
> persisted ring state
> INFO  [main] 2016-07-25 21:58:46,418 StorageService.java:743 - Starting up
> server gossip
> INFO  [main] 2016-07-25 21:58:46,680 TokenMetadata.java:429 - Updating
> topology for ip-10-4-43-66.ec2.internal/10.4.43.66
> INFO  [main] 2016-07-25 21:58:46,707 TokenMetadata.java:429 - Updating
> topology for ip-10-4-43-66.ec2.internal/10.4.43.66
> INFO  [main] 2016-07-25 21:58:46,792 MessagingService.java:557 - Starting
> Messaging Service on ip-10-4-43-66.ec2.internal/10.4.43.66:7000 (eth0)
>
> INFO  [HANDSHAKE-/10.4.68.222] 2016-07-25 21:58:46,920
> OutboundTcpConnection.java:515 - Handshaking version with /10.4.68.222
> INFO  [GossipStage:1] 2016-07-25 21:58:47,011 Gossiper.java:1028 - Node /
> 10.4.68.221 has restarted, now UP
> INFO  [HANDSHAKE-/10.4.68.222] 2016-07-25 21:58:47,007
> OutboundTcpConnection.java:515 - Handshaking version with /10.4.68.222
> INFO  [main] 2016-07-25 21:58:47,030 StorageService.java:1902 - Node
> ip-10-4-43-66.ec2.internal/10.4.43.66 state jump to NORMAL
> INFO  [main] 2016-07-25 21:58:47,096 CassandraDaemon.java:644 - Waiting
> for gossip to settle before accepting client requests...
> INFO  [GossipStage:1] 2016-07-25 21:58:47,134 StorageService.java:1902 -
> Node /10.4.68.221 state jump to NORMAL
> INFO  [HANDSHAKE-/10.4.68.221] 2016-07-25 21:58:47,137
> OutboundTcpConnection.java:515 - Handshaking version with /10.4.68.221
> INFO  [GossipStage:1] 2016-07-25 21:58:47,211 TokenMetadata.java:429 -
> Updating topology for /10.4.68.221
> INFO  [GossipStage:1] 2016-07-25 21:58:47,261 TokenMetadata.java:429 -
> Updating topology for /10.4.68.221
> INFO  [GossipStage:1] 2016-07-25 21:58:47,295 Gossiper.java:1028 - Node /
> 10.4.68.222 has restarted, now UP
> INFO  [GossipStage:1] 2016-07-25 21:58:47,337 StorageService.java:1902 -
> Node /10.4.68.222 state jump to NORMAL
> INFO  [GossipStage:1] 2016-07-25 21:58:47,385 TokenMetadata.java:429 -
> Updating topology for /10.4.68.222
> INFO  [GossipStage:1] 2016-07-25 21:58:47,452 TokenMetadata.java:429 -
> Updating topology for /10.4.68.222
> INFO  [GossipStage:1] 2016-07-25 21:58:47,497 Gossiper.java:1028 - Node /
> 10.4.54.176 has restarted, now UP
> INFO  [GossipStage:1] 2016-07-25 21:58:47,544 StorageService.java:1902 -
> Node /10.4.54.176 state jump to NORMAL
> INFO  [HANDSHAKE-/10.4.54.176] 2016-07-25 21:58:47,548
> OutboundTcpConnection.java:515 - Handshaking version with /10.4.54.176
> INFO  [GossipStage:1] 2016-07-25 21:58:47,594 TokenMetadata.java:429 -
> Updating topology for /10.4.54.176
> INFO  [GossipStage:1] 2016-07-25 21:58:47,639 TokenMetadata.java:429 -
> Updating topology for /10.4.54.176
> WARN  [GossipTasks:1] 2016-07-25 21:58:47,678 FailureDetector.java:287 -
> Not marking nodes down due to local pause of 43226235115 > 5000000000
> INFO  [HANDSHAKE-/10.4.43.65] 2016-07-25 21:58:47,679
> OutboundTcpConnection.java:515 - Handshaking version with /10.4.43.65
> INFO  [GossipStage:1] 2016-07-25 21:58:47,757 Gossiper.java:1028 - Node /
> 10.4.54.177 has restarted, now UP
> INFO  [GossipStage:1] 2016-07-25 21:58:47,788 StorageService.java:1902 -
> Node /10.4.54.177 state jump to NORMAL
> INFO  [HANDSHAKE-/10.4.54.177] 2016-07-25 21:58:47,789
> OutboundTcpConnection.java:515 - Handshaking version with /10.4.54.177
> INFO  [GossipStage:1] 2016-07-25 21:58:47,836 TokenMetadata.java:429 -
> Updating topology for /10.4.54.177
> INFO  [GossipStage:1] 2016-07-25 21:58:47,887 TokenMetadata.java:429 -
> Updating topology for /10.4.54.177
> INFO  [GossipStage:1] 2016-07-25 21:58:47,926 Gossiper.java:1028 - Node /
> 10.4.43.65 has restarted, now UP
> INFO  [GossipStage:1] 2016-07-25 21:58:47,976 StorageService.java:1902 -
> Node /10.4.43.65 state jump to NORMAL
> INFO  [GossipStage:1] 2016-07-25 21:58:48,036 TokenMetadata.java:429 -
> Updating topology for /10.4.43.65
> INFO  [GossipStage:1] 2016-07-25 21:58:48,076 TokenMetadata.java:429 -
> Updating topology for /10.4.43.65
> INFO  [GossipStage:1] 2016-07-25 21:58:48,122 StorageService.java:1902 -
> Node /10.4.43.65 state jump to NORMAL
> INFO  [HANDSHAKE-/10.4.54.176] 2016-07-25 21:58:48,123
> OutboundTcpConnection.java:515 - Handshaking version with /10.4.54.176
> INFO  [HANDSHAKE-/10.4.68.221] 2016-07-25 21:58:48,123
> OutboundTcpConnection.java:515 - Handshaking version with /10.4.68.221
> INFO  [GossipStage:1] 2016-07-25 21:58:48,165 TokenMetadata.java:429 -
> Updating topology for /10.4.43.65
> INFO  [SharedPool-Worker-1] 2016-07-25 21:58:48,170 Gossiper.java:994 -
> InetAddress /10.4.54.177 is now UP
> INFO  [GossipStage:1] 2016-07-25 21:58:48,198 TokenMetadata.java:429 -
> Updating topology for /10.4.43.65
> INFO  [GossipStage:1] 2016-07-25 21:58:48,278 StorageService.java:1902 -
> Node /10.4.54.177 state jump to NORMAL
> INFO  [GossipStage:1] 2016-07-25 21:58:48,345 TokenMetadata.java:429 -
> Updating topology for /10.4.54.177
> INFO  [GossipStage:1] 2016-07-25 21:58:48,388 TokenMetadata.java:429 -
> Updating topology for /10.4.54.177
> INFO  [HANDSHAKE-/10.4.43.65] 2016-07-25 21:58:48,444
> OutboundTcpConnection.java:515 - Handshaking version with /10.4.43.65
> INFO  [HANDSHAKE-/10.4.54.177] 2016-07-25 21:58:48,444
> OutboundTcpConnection.java:515 - Handshaking version with /10.4.54.177
> INFO  [main] 2016-07-25 21:58:55,155 CassandraDaemon.java:675 - No gossip
> backlog; proceeding
> INFO  [main] 2016-07-25 21:58:55,436 NativeTransportService.java:70 -
> Netty using native Epoll event loop
> INFO  [main] 2016-07-25 21:58:55,594 Server.java:159 - Using Netty
> Version: [netty-buffer=netty-buffer-4.0.23.Final.208198c,
> netty-codec=netty-codec-4.0.23.Final.208198c,
> netty-codec-http=netty-codec-http-4.0.23.Final.208198c,
> netty-codec-socks=netty-codec-socks-4.0.23.Final.208198c,
> netty-common=netty-common-4.0.23.Final.208198c,
> netty-handler=netty-handler-4.0.23.Final.208198c,
> netty-transport=netty-transport-4.0.23.Final.208198c,
> netty-transport-rxtx=netty-transport-rxtx-4.0.23.Final.208198c,
> netty-transport-sctp=netty-transport-sctp-4.0.23.Final.208198c,
> netty-transport-udt=netty-transport-udt-4.0.23.Final.208198c]
> INFO  [main] 2016-07-25 21:58:55,691 Server.java:160 - Starting listening
> for CQL clients on /0.0.0.0:9042 (unencrypted)...
> INFO  [main] 2016-07-25 21:58:55,842 ThriftServer.java:119 - Binding
> thrift service to /0.0.0.0:9160
> INFO  [Thread-3] 2016-07-25 21:58:55,896 ThriftServer.java:136 - Listening
> for thrift clients...
> INFO  [HANDSHAKE-/10.4.54.177] 2016-07-25 21:59:24,753
> OutboundTcpConnection.java:515 - Handshaking version with /10.4.54.177
> INFO  [HANDSHAKE-ip-10-4-43-66.ec2.internal/10.4.43.66] 2016-07-25
> 21:59:33,375 OutboundTcpConnection.java:515 - Handshaking version with
> ip-10-4-43-66.ec2.internal/10.4.43.66
> INFO  [SharedPool-Worker-4] 2016-07-25 22:14:19,273 Gossiper.java:994 -
> InetAddress /10.4.43.65 is now UP
> INFO  [SharedPool-Worker-2] 2016-07-25 22:14:19,274 Gossiper.java:994 -
> InetAddress /10.4.43.65 is now UP
> INFO  [SharedPool-Worker-7] 2016-07-25 22:14:19,307 Gossiper.java:994 -
> InetAddress /10.4.43.65 is now UP
> INFO  [SharedPool-Worker-36] 2016-07-25 22:14:19,313 Gossiper.java:994 -
> InetAddress /10.4.43.65 is now UP
> INFO  [SharedPool-Worker-4] 2016-07-25 22:14:19,347 Gossiper.java:994 -
> InetAddress /10.4.43.65 is now UP
> INFO  [SharedPool-Worker-2] 2016-07-25 22:14:19,356 Gossiper.java:994 -
> InetAddress /10.4.43.65 is now UP
> INFO  [SharedPool-Worker-7] 2016-07-25 22:14:19,369 Gossiper.java:994 -
> InetAddress /10.4.43.65 is now UP
> INFO  [SharedPool-Worker-36] 2016-07-25 22:14:19,372 Gossiper.java:994 -
> InetAddress /10.4.43.65 is now UP
> INFO  [SharedPool-Worker-2] 2016-07-25 22:14:19,426 Gossiper.java:994 -
> InetAddress /10.4.43.65 is now UP
> INFO  [SharedPool-Worker-36] 2016-07-25 22:14:19,426 Gossiper.java:994 -
> InetAddress /10.4.43.65 is now UP
> INFO  [SharedPool-Worker-4] 2016-07-25 22:14:19,434 Gossiper.java:994 -
> InetAddress /10.4.43.65 is now UP
> INFO  [SharedPool-Worker-7] 2016-07-25 22:14:19,442 Gossiper.java:994 -
> InetAddress /10.4.43.65 is now UP
> INFO  [SharedPool-Worker-4] 2016-07-25 22:14:19,486 Gossiper.java:994 -
> InetAddress /10.4.43.65 is now UP
> INFO  [SharedPool-Worker-2] 2016-07-25 22:14:19,500 Gossiper.java:994 -
> InetAddress /10.4.43.65 is now UP
> INFO  [SharedPool-Worker-36] 2016-07-25 22:14:19,508 Gossiper.java:994 -
> InetAddress /10.4.43.65 is now UP
> INFO  [SharedPool-Worker-7] 2016-07-25 22:14:19,514 Gossiper.java:994 -
> InetAddress /10.4.43.65 is now UP
> INFO  [SharedPool-Worker-1] 2016-07-25 22:14:19,687 Gossiper.java:994 -
> InetAddress /10.4.54.176 is now UP
> INFO  [SharedPool-Worker-5] 2016-07-25 22:14:19,699 Gossiper.java:994 -
> InetAddress /10.4.54.176 is now UP
> INFO  [SharedPool-Worker-33] 2016-07-25 22:14:19,719 Gossiper.java:994 -
> InetAddress /10.4.54.176 is now UP
> INFO  [SharedPool-Worker-7] 2016-07-25 22:14:19,721 Gossiper.java:994 -
> InetAddress /10.4.54.176 is now UP
> INFO  [SharedPool-Worker-5] 2016-07-25 22:14:19,756 Gossiper.java:994 -
> InetAddress /10.4.54.176 is now UP
> INFO  [SharedPool-Worker-1] 2016-07-25 22:14:19,757 Gossiper.java:994 -
> InetAddress /10.4.54.176 is now UP
> INFO  [SharedPool-Worker-33] 2016-07-25 22:14:19,762 Gossiper.java:994 -
> InetAddress /10.4.54.176 is now UP
> INFO  [SharedPool-Worker-7] 2016-07-25 22:14:19,778 Gossiper.java:994 -
> InetAddress /10.4.54.176 is now UP
> INFO  [SharedPool-Worker-1] 2016-07-25 22:14:19,809 Gossiper.java:994 -
> InetAddress /10.4.54.176 is now UP
> INFO  [SharedPool-Worker-5] 2016-07-25 22:14:19,814 Gossiper.java:994 -
> InetAddress /10.4.54.176 is now UP
> INFO  [SharedPool-Worker-7] 2016-07-25 22:14:19,819 Gossiper.java:994 -
> InetAddress /10.4.54.176 is now UP
> INFO  [SharedPool-Worker-33] 2016-07-25 22:14:19,822 Gossiper.java:994 -
> InetAddress /10.4.54.176 is now UP
> INFO  [SharedPool-Worker-1] 2016-07-25 22:14:19,872 Gossiper.java:994 -
> InetAddress /10.4.54.176 is now UP
> INFO  [SharedPool-Worker-5] 2016-07-25 22:14:19,874 Gossiper.java:994 -
> InetAddress /10.4.54.176 is now UP
> INFO  [SharedPool-Worker-7] 2016-07-25 22:14:19,879 Gossiper.java:994 -
> InetAddress /10.4.54.176 is now UP
> INFO  [SharedPool-Worker-33] 2016-07-25 22:14:22,489 Gossiper.java:994 -
> InetAddress /10.4.68.222 is now UP
> INFO  [SharedPool-Worker-36] 2016-07-25 22:14:22,491 Gossiper.java:994 -
> InetAddress /10.4.68.222 is now UP
> INFO  [SharedPool-Worker-34] 2016-07-25 22:14:22,491 Gossiper.java:994 -
> InetAddress /10.4.68.222 is now UP
> INFO  [SharedPool-Worker-35] 2016-07-25 22:14:22,491 Gossiper.java:994 -
> InetAddress /10.4.68.222 is now UP
> INFO  [SharedPool-Worker-36] 2016-07-25 22:14:22,533 Gossiper.java:994 -
> InetAddress /10.4.68.222 is now UP
> INFO  [SharedPool-Worker-34] 2016-07-25 22:14:22,549 Gossiper.java:994 -
> InetAddress /10.4.68.222 is now UP
> INFO  [SharedPool-Worker-33] 2016-07-25 22:14:22,570 Gossiper.java:994 -
> InetAddress /10.4.68.222 is now UP
> INFO  [SharedPool-Worker-35] 2016-07-25 22:14:22,574 Gossiper.java:994 -
> InetAddress /10.4.68.222 is now UP
> INFO  [SharedPool-Worker-36] 2016-07-25 22:14:22,637 Gossiper.java:994 -
> InetAddress /10.4.68.222 is now UP
> INFO  [SharedPool-Worker-34] 2016-07-25 22:14:22,655 Gossiper.java:994 -
> InetAddress /10.4.68.222 is now UP
> INFO  [SharedPool-Worker-33] 2016-07-25 22:14:22,672 Gossiper.java:994 -
> InetAddress /10.4.68.222 is now UP
> INFO  [SharedPool-Worker-35] 2016-07-25 22:14:22,685 Gossiper.java:994 -
> InetAddress /10.4.68.222 is now UP
> INFO  [SharedPool-Worker-36] 2016-07-25 22:14:22,703 Gossiper.java:994 -
> InetAddress /10.4.68.222 is now UP
> INFO  [SharedPool-Worker-33] 2016-07-25 22:14:22,710 Gossiper.java:994 -
> InetAddress /10.4.68.222 is now UP
> INFO  [SharedPool-Worker-34] 2016-07-25 22:14:22,725 Gossiper.java:994 -
> InetAddress /10.4.68.222 is now UP
> INFO  [SharedPool-Worker-35] 2016-07-25 22:14:22,740 Gossiper.java:994 -
> InetAddress /10.4.68.221 is now UP
> INFO  [SharedPool-Worker-36] 2016-07-25 22:14:22,748 Gossiper.java:994 -
> InetAddress /10.4.68.221 is now UP
> INFO  [SharedPool-Worker-33] 2016-07-25 22:14:22,758 Gossiper.java:994 -
> InetAddress /10.4.68.221 is now UP
> INFO  [SharedPool-Worker-34] 2016-07-25 22:14:22,764 Gossiper.java:994 -
> InetAddress /10.4.68.221 is now UP
> INFO  [SharedPool-Worker-36] 2016-07-25 22:14:22,788 Gossiper.java:994 -
> InetAddress /10.4.68.221 is now UP
> INFO  [SharedPool-Worker-35] 2016-07-25 22:14:22,806 Gossiper.java:994 -
> InetAddress /10.4.68.221 is now UP
> INFO  [SharedPool-Worker-33] 2016-07-25 22:14:22,808 Gossiper.java:994 -
> InetAddress /10.4.68.221 is now UP
> INFO  [SharedPool-Worker-34] 2016-07-25 22:14:22,836 Gossiper.java:994 -
> InetAddress /10.4.68.221 is now UP
> INFO  [SharedPool-Worker-36] 2016-07-25 22:14:22,843 Gossiper.java:994 -
> InetAddress /10.4.68.221 is now UP
> INFO  [SharedPool-Worker-33] 2016-07-25 22:14:22,849 Gossiper.java:994 -
> InetAddress /10.4.68.221 is now UP
> INFO  [SharedPool-Worker-35] 2016-07-25 22:14:22,862 Gossiper.java:994 -
> InetAddress /10.4.68.221 is now UP
> INFO  [SharedPool-Worker-33] 2016-07-25 22:14:22,892 Gossiper.java:994 -
> InetAddress /10.4.68.221 is now UP
> INFO  [SharedPool-Worker-34] 2016-07-25 22:14:22,900 Gossiper.java:994 -
> InetAddress /10.4.68.221 is now UP
> INFO  [SharedPool-Worker-36] 2016-07-25 22:14:22,904 Gossiper.java:994 -
> InetAddress /10.4.68.221 is now UP
> INFO  [SharedPool-Worker-35] 2016-07-25 22:14:22,944 Gossiper.java:994 -
> InetAddress /10.4.68.221 is now UP
> INFO  [SharedPool-Worker-33] 2016-07-25 22:14:22,945 Gossiper.java:994 -
> InetAddress /10.4.68.221 is now UP
>
>

Mime
View raw message