Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 55D9E18B34 for ; Thu, 23 Jul 2015 14:41:08 +0000 (UTC) Received: (qmail 43889 invoked by uid 500); 23 Jul 2015 14:41:05 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 43855 invoked by uid 500); 23 Jul 2015 14:41:05 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 43842 invoked by uid 99); 23 Jul 2015 14:41:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Jul 2015 14:41:05 +0000 Date: Thu, 23 Jul 2015 14:41:04 +0000 (UTC) From: "Stefania (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (CASSANDRA-9871) Cannot replace token does not exist - DN node removed as Fat Client MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638902#comment-14638902 ] Stefania edited comment on CASSANDRA-9871 at 7/23/15 2:40 PM: -------------------------------------------------------------- bq. can you provide a dump of both nodetool gossipinfo and nodetool status? {code} Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 127.0.0.1 82.71 KB 256 ? af23fcbb-fce4-495c-b5b5-b0b90ccc71c1 rack1 UN 127.0.0.2 51.57 KB 256 ? 11814d51-5120-4f9f-b5fc-d0ffa534f964 rack1 DN 127.0.0.3 51.59 KB 256 ? 0101e850-7f3a-499c-a80c-092ecf4e27e3 rack1 Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless /127.0.0.1 generation:1437661129 heartbeat:164 RELEASE_VERSION:2.1.8-SNAPSHOT SEVERITY:0.0 STATUS:NORMAL,-107708216716906722 DC:datacenter1 NET_VERSION:8 RACK:rack1 HOST_ID:af23fcbb-fce4-495c-b5b5-b0b90ccc71c1 SCHEMA:fa2a3033-51b7-30c0-8926-a2b71bf0fd8a RPC_ADDRESS:127.0.0.1 LOAD:52781.0 /127.0.0.2 generation:1437661129 heartbeat:166 SEVERITY:0.0 RELEASE_VERSION:2.1.8-SNAPSHOT STATUS:NORMAL,-1054644930469012369 DC:datacenter1 NET_VERSION:8 RACK:rack1 HOST_ID:11814d51-5120-4f9f-b5fc-d0ffa534f964 SCHEMA:fa2a3033-51b7-30c0-8926-a2b71bf0fd8a RPC_ADDRESS:127.0.0.2 LOAD:52807.0 /127.0.0.3 generation:1437661129 heartbeat:2147483647 RELEASE_VERSION:2.1.8-SNAPSHOT SEVERITY:0.0 STATUS:shutdown,true DC:datacenter1 NET_VERSION:8 RACK:rack1 HOST_ID:0101e850-7f3a-499c-a80c-092ecf4e27e3 SCHEMA:fa2a3033-51b7-30c0-8926-a2b71bf0fd8a RPC_ADDRESS:127.0.0.3 LOAD:52826.0 {code} bq. isFatClient returns true as the endpoint is not a member in TokenMetadata and that's why we fail in SS.joinTokenRing (we check to see if the token is associated with a TokenMetadata member). Yes this is the root cause but why would the node not be a member? I guess handleStateNormal() is never called, so once again isFatClient() is at fault, just like for CASSANDRA-9765? Anyway, I plan on putting more debug information tomorrow to find out when the TM is modified. was (Author: stefania): bq. can you provide a dump of both nodetool gossipinfo and nodetool status? {code} Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 127.0.0.1 82.71 KB 256 ? af23fcbb-fce4-495c-b5b5-b0b90ccc71c1 rack1 UN 127.0.0.2 51.57 KB 256 ? 11814d51-5120-4f9f-b5fc-d0ffa534f964 rack1 DN 127.0.0.3 51.59 KB 256 ? 0101e850-7f3a-499c-a80c-092ecf4e27e3 rack1 Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless /127.0.0.1 generation:1437661129 heartbeat:164 RELEASE_VERSION:2.1.8-SNAPSHOT SEVERITY:0.0 STATUS:NORMAL,-107708216716906722 DC:datacenter1 NET_VERSION:8 RACK:rack1 HOST_ID:af23fcbb-fce4-495c-b5b5-b0b90ccc71c1 SCHEMA:fa2a3033-51b7-30c0-8926-a2b71bf0fd8a RPC_ADDRESS:127.0.0.1 LOAD:52781.0 /127.0.0.2 generation:1437661129 heartbeat:166 SEVERITY:0.0 RELEASE_VERSION:2.1.8-SNAPSHOT STATUS:NORMAL,-1054644930469012369 DC:datacenter1 NET_VERSION:8 RACK:rack1 HOST_ID:11814d51-5120-4f9f-b5fc-d0ffa534f964 SCHEMA:fa2a3033-51b7-30c0-8926-a2b71bf0fd8a RPC_ADDRESS:127.0.0.2 LOAD:52807.0 /127.0.0.3 generation:1437661129 heartbeat:2147483647 RELEASE_VERSION:2.1.8-SNAPSHOT SEVERITY:0.0 STATUS:shutdown,true DC:datacenter1 NET_VERSION:8 RACK:rack1 HOST_ID:0101e850-7f3a-499c-a80c-092ecf4e27e3 SCHEMA:fa2a3033-51b7-30c0-8926-a2b71bf0fd8a RPC_ADDRESS:127.0.0.3 LOAD:52826.0 {code} bq. isFatClient returns true as the endpoint is not a member in TokenMetadata and that's why we fail in SS.joinTokenRing (we check to see if the token is associated with a TokenMetadata member). Yes this is the root cause but why would the node not be a member? Anyway, I plan on putting more debug information tomorrow to find out when the TM is modified. > Cannot replace token does not exist - DN node removed as Fat Client > ------------------------------------------------------------------- > > Key: CASSANDRA-9871 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9871 > Project: Cassandra > Issue Type: Bug > Reporter: Sebastian Estevez > Assignee: Stefania > Fix For: 2.1.x > > > We lost a node due to disk failure, we tried to replace it via -Dcassandra.replace_address per -- http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsReplaceNode.html > The node would not come up with these errors in the system.log: > {code} > INFO [main] 2015-07-22 03:20:06,722 StorageService.java:500 - Gathering node replacement information for /10.171.115.233 > ... > INFO [SharedPool-Worker-1] 2015-07-22 03:22:34,281 Gossiper.java:954 - InetAddress /10.111.183.101 is now UP > INFO [GossipTasks:1] 2015-07-22 03:22:59,300 Gossiper.java:735 - FatClient /10.171.115.233 has been silent for 30000ms, removing from gossip > ERROR [main] 2015-07-22 03:23:28,485 CassandraDaemon.java:541 - Exception encountered during startup > java.lang.UnsupportedOperationException: Cannot replace token -1013652079972151677 which does not exist! > {code} > It is not clear why Gossiper removed the node as a FatClient, given that it was a full node before it died and it had tokens assigned to it (including -1013652079972151677) in system.peers and nodetool ring. -- This message was sent by Atlassian JIRA (v6.3.4#6332)