Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1F1D210FF6 for ; Sat, 11 Jan 2014 06:49:15 +0000 (UTC) Received: (qmail 94378 invoked by uid 500); 11 Jan 2014 06:48:41 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 94314 invoked by uid 500); 11 Jan 2014 06:48:20 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 94233 invoked by uid 99); 11 Jan 2014 06:47:56 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Jan 2014 06:47:56 +0000 Date: Sat, 11 Jan 2014 06:47:55 +0000 (UTC) From: "Vijay (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CASSANDRA-6571) Quickly restarted nodes can list others as down indefinitely MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868685#comment-13868685 ] Vijay edited comment on CASSANDRA-6571 at 1/11/14 6:46 AM: ----------------------------------------------------------- Not sure if this will fix it, because the remote machine has not responded back (echo message response). 1) I think we need to always mark the nodes as dead and mark it up only after we received the echo response 2) I think we need to check or reset the socket in the receiving side, may be need to markDead (or retry the message after x seconds?) May be because we removed the hibernate during restarts, this issue shows up? (we are not resetting the states) <== [~brandon.williams] I think the hang on the echo response (socket.write()) was (Author: vijay2win@yahoo.com): Not sure if this will fix it, because the remote machine has not responded back (echo message response). 1) I think we need to always mark the nodes as dead and mark it up only after we received the echo response 2) I think we need to check or reset the socket in the receiving side, may be need to markDead (or retry the message after x seconds?) May be because we removed the hibernate during restarts, this issue shows up? (we are not restarting the states) <== [~brandon.williams] I think the hang on the echo response (socket.write()) > Quickly restarted nodes can list others as down indefinitely > ------------------------------------------------------------ > > Key: CASSANDRA-6571 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6571 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Richard Low > Assignee: sankalp kohli > Labels: gossip > Fix For: 2.0.5 > > > In a healthy cluster, if a node is restarted quickly, it may list other nodes as down when it comes back up and never list them as up. I reproduced it on a small cluster running in Docker containers. > 1. Have a healthy 5 node cluster: > {quote} > $ nodetool status > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID Rack > UN 192.168.100.1 40.88 KB 256 38.3% 92930ef6-1b29-49f0-a8cd-f962b55dca1b rack1 > UN 192.168.100.254 80.63 KB 256 39.6% ef15a717-9d60-48fb-80a9-e0973abdd55e rack1 > UN 192.168.100.3 87.78 KB 256 40.8% 4e6765db-97ed-4429-a9f4-8e29de247f18 rack1 > UN 192.168.100.2 75.22 KB 256 40.6% e89bc581-5345-4abd-88ba-7018371940fc rack1 > UN 192.168.100.4 80.83 KB 256 40.8% 466a9798-d484-44f0-aae8-bb2b78d80331 rack1 > {quote} > 2. Kill a node and restart it quickly: > bq. kill -9 && start-cassandra > 3. Wait for the node to come back and more often than not, it lists one or more other nodes as down indefinitely: > {quote} > $ nodetool status > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID Rack > UN 192.168.100.1 40.88 KB 256 38.3% 92930ef6-1b29-49f0-a8cd-f962b55dca1b rack1 > UN 192.168.100.254 80.63 KB 256 39.6% ef15a717-9d60-48fb-80a9-e0973abdd55e rack1 > DN 192.168.100.3 87.78 KB 256 40.8% 4e6765db-97ed-4429-a9f4-8e29de247f18 rack1 > DN 192.168.100.2 75.22 KB 256 40.6% e89bc581-5345-4abd-88ba-7018371940fc rack1 > DN 192.168.100.4 80.83 KB 256 40.8% 466a9798-d484-44f0-aae8-bb2b78d80331 rack1 > {quote} > From trace logging, here's what I think is going on: > 1. The nodes are all happy gossiping > 2. Restart node X. When it comes back up it starts gossiping with the other nodes. > 3. Before node X marks node Y as alive, X sends an echo message (introduced in CASSANDRA-3533) > 4. The echo message is received by Y. To reply, Y attempts to reuse a connection to X. The connection is dead, but the message is attempted anyway but fails. > 5. X never receives the echo back, so Y isn't marked as alive. > 6. X gossips to Y again, but because the endpoint isAlive() returns true, it never calls markAlive() to properly set Y as alive. > I tried to fix this by defaulting isAlive=false in the constructor of EndpointState. This made it less likely to mark a node as down but it still happens. > The workaround is to leave a node down for a while so the connections die on the remaining nodes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)