Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 455AF173F0 for ; Thu, 16 Oct 2014 17:48:34 +0000 (UTC) Received: (qmail 87301 invoked by uid 500); 16 Oct 2014 17:48:34 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 87256 invoked by uid 500); 16 Oct 2014 17:48:34 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 87240 invoked by uid 99); 16 Oct 2014 17:48:33 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Oct 2014 17:48:33 +0000 Date: Thu, 16 Oct 2014 17:48:33 +0000 (UTC) From: "Brandon Williams (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (CASSANDRA-5914) Failed replace_node bootstrap leaves gossip in weird state ; possible perf problem MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams resolved CASSANDRA-5914. ----------------------------------------- Resolution: Not a Problem > Failed replace_node bootstrap leaves gossip in weird state ; possible perf problem > ---------------------------------------------------------------------------------- > > Key: CASSANDRA-5914 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5914 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 1.2.8 > Reporter: Chris Burroughs > > A node was down for a week or two due to hardware disk failure. I tried to use replace_node to bring up a new node on the same physical host with the same IPs. (rbranson suspected that using the same IP may be issue prone.) This failed due to "unable to find sufficient sources for streaming range". However, gossip for the to-be-replaced node was left in a funky state: > {noformat} > /64.215.255.182 > RACK:NOP > NET_VERSION:6 > HOST_ID:4f3b214b-b03e-46eb-8214-5fab2662a06b > RELEASE_VERSION:1.2.8 > DC:IAD > INTERNAL_IP:10.15.2.182 > SCHEMA:59adb24e-f3cd-3e02-97f0-5b395827453f > RPC_ADDRESS:0.0.0.0 > {noformat} > (See CASSANDRA-5913 for cosmetic issue with nt:status.) > This seems (A) confusing and (B) the failed replace_token correlated with 95th percentile read latency for this cluster going from 8k microseconds to around 200k microseconds (on both DCs in a mutli-dc cluster reading at CL.ONE). I don't have a good theory for the correlation but performance was bad for over an hour and returned to normal once a successful replace_token was performed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)