Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 00B49200C03 for ; Sat, 7 Jan 2017 00:46:12 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id F361C160B48; Fri, 6 Jan 2017 23:46:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CBF30160B39 for ; Sat, 7 Jan 2017 00:46:10 +0100 (CET) Received: (qmail 27112 invoked by uid 500); 6 Jan 2017 23:46:09 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 27096 invoked by uid 99); 6 Jan 2017 23:46:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jan 2017 23:46:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id C77E2C0C5F for ; Fri, 6 Jan 2017 23:46:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.32 X-Spam-Level: X-Spam-Status: No, score=-1.32 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_REPLYTO_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id neMauMsEDwde for ; Fri, 6 Jan 2017 23:46:07 +0000 (UTC) Received: from nm7-vm6.bullet.mail.ne1.yahoo.com (nm7-vm6.bullet.mail.ne1.yahoo.com [98.138.91.100]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 720455F3BC for ; Fri, 6 Jan 2017 23:46:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1483746357; bh=uJSvTjwAVEYUVoOOpqBGr6PvZn3AuFRXWl1HE3JRags=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject:From:Subject; b=s/o87CbrvXUeTePgj7W+w114GPHhGNcvq/uYmtAJFAkoCK59fpkpgi8SSF3R0v564m1V8hybjWn/fnH01A3hsAvThEyunQdEkmt2XiXpYYbtJM2WdzqmQxWlpWWdaMcwRJNrGSodU1ihgqHdW1MCQLy/ttoP6ntdBMPS5Tsq0hv2qgwS2ga3wqds5Ik8Ww82tBZIPYSZh0x8FKfyQgy9k3GOxe+3lAHcTnuC+s2hY7ks5CrVPFvMR2O04lmbg1zfSCZIIZboOW6hpHw86wAIGKk1xFiqx44jkxhEUoEkKzRP1P+++E/3jWTpFms8juy44UnJ2hl0L8tWfP3HxsAsdQ== Received: from [98.138.226.179] by nm7.bullet.mail.ne1.yahoo.com with NNFMP; 06 Jan 2017 23:45:57 -0000 Received: from [98.138.89.162] by tm14.bullet.mail.ne1.yahoo.com with NNFMP; 06 Jan 2017 23:45:57 -0000 Received: from [127.0.0.1] by omp1018.mail.ne1.yahoo.com with NNFMP; 06 Jan 2017 23:45:57 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 362651.94996.bm@omp1018.mail.ne1.yahoo.com X-YMail-OSG: 1xFEMfIVM1lpEbQba.6SmK418RCMAuHWi_IZJ6Xqm5psu6cw3piA3QndGAtwG5P hhG7nNsXdm9J2j_g0JXD6He_w379V2AEjyBUHB6IonfeXetC6OZMtzNREnF2ETa2jCKA5rtdG1Ti Q.YHlBU9ej5JoQ8P_L2zBPIMnz2CFcKNnHWuOfbArkqqsAw8ISQboqg816csZ14u5Fey2fE2pLEN aI_kOUei1fF_DEE4xV3XxEYvb4wQwC4RnGjvW34Fo.WFIxZjX25x_l7uuAMeAJBT1rIk.zwAcY7W Uls_mtcUfIckz8B1NYiAEz0pFchirhfRrEC8EcHTTUQzjMdQYhK3ZbosFeN.A5DEp4LUgtCENTJv 46Lie4cMoCuJ_fT.0n3nEL367ry4XPda9eo3J9jSJ7Q8EEZw9J_gVfmtE.OyshhdL18vywyfAQzk P6qCMJ3.CpWnGIh_xHq5TTKrMyGnFxyEs7NliC7sOpY14U8hw9JfZBZCBJkjTqkE2h8CMZrRSgsr TPlU1cuy6h5.ktKO0YEw9fE91ufbQGgr1Xfozd_5IQJLnqPPKwA7WRPxVGOv50We4w8ZTrk7qHEDD Received: from jws200096.mail.ne1.yahoo.com by sendmailws119.mail.ne1.yahoo.com; Fri, 06 Jan 2017 23:45:56 +0000; 1483746356.907 Date: Fri, 6 Jan 2017 23:45:54 +0000 (UTC) From: Sotirios Delimanolis Reply-To: Sotirios Delimanolis To: "user@cassandra.apache.org" Message-ID: <2005126435.1414119.1483746354444@mail.yahoo.com> In-Reply-To: References: <1959545284.1303559.1483731457369.ref@mail.yahoo.com> <1959545284.1303559.1483731457369@mail.yahoo.com> Subject: Re: Logs appear to contradict themselves during bootstrap steps MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_1414118_593439352.1483746354437" archived-at: Fri, 06 Jan 2017 23:46:12 -0000 ------=_Part_1414118_593439352.1483746354437 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I forgot to check nodetool gossipinfo. Still, why does the first check thin= k that the address exists, but the second doesn't?=20 On Friday, January 6, 2017 1:11 PM, David Berry = wrote: =20 #yiv4782259727 #yiv4782259727 -- _filtered #yiv4782259727 {panose-1:2 4 5 = 3 5 4 6 3 2 4;} _filtered #yiv4782259727 {font-family:Calibri;panose-1:2 15= 5 2 2 2 4 3 2 4;} _filtered #yiv4782259727 {font-family:Georgia;panose-1:2= 4 5 2 5 4 5 2 3 3;}#yiv4782259727 #yiv4782259727 p.yiv4782259727MsoNormal,= #yiv4782259727 li.yiv4782259727MsoNormal, #yiv4782259727 div.yiv4782259727= MsoNormal {margin:0in;margin-bottom:.0001pt;font-size:12.0pt;}#yiv478225972= 7 h2 {margin-top:34.5pt;margin-right:0in;margin-bottom:10.5pt;margin-left:0= in;font-size:15.0pt;color:#143470;font-weight:normal;}#yiv4782259727 a:link= , #yiv4782259727 span.yiv4782259727MsoHyperlink {color:blue;text-decoration= :underline;}#yiv4782259727 a:visited, #yiv4782259727 span.yiv4782259727MsoH= yperlinkFollowed {color:purple;text-decoration:underline;}#yiv4782259727 p.= yiv4782259727msonormal0, #yiv4782259727 li.yiv4782259727msonormal0, #yiv478= 2259727 div.yiv4782259727msonormal0 {margin-right:0in;margin-left:0in;font-= size:12.0pt;}#yiv4782259727 span.yiv4782259727EmailStyle18 {color:windowtex= t;}#yiv4782259727 span.yiv4782259727Heading2Char {color:#143470;}#yiv478225= 9727 span.yiv4782259727z-TopofFormChar {display:none;}#yiv4782259727 span.y= iv4782259727z-BottomofFormChar {display:none;}#yiv4782259727 .yiv4782259727= MsoChpDefault {font-size:10.0pt;} _filtered #yiv4782259727 {margin:1.0in 1.= 0in 1.0in 1.0in;}#yiv4782259727 div.yiv4782259727WordSection1 {}#yiv4782259= 727 I=E2=80=99ve encountered this previously where after removing a node, g= ossip info is retained for 72 hours which doesn=E2=80=99t allow the IP to b= e reused during that period.=C2=A0=C2=A0 You can check how long gossip will= retain this information using =E2=80=9Cnodetool gossipinfo=E2=80=9D where = the epoch time will be shown with status =C2=A0 For example=E2=80=A6. =C2= =A0 Nodetool gossipinfo =C2=A0 /10.236.70.199 =C2=A0 generation:1482436691= =C2=A0 heartbeat:3942407 =C2=A0 STATUS:3942404:LEFT,3074457345618261000,14= 83995662276 =C2=A0 LOAD:3942267:3.60685807E8 =C2=A0 SCHEMA:223625:acbf0adb-= 1bbe-384a-acd7-6a46609497f1 =C2=A0 DC:20:orion =C2=A0 RACK:22:r1 =C2=A0 REL= EASE_VERSION:4:2.1.16 =C2=A0 RPC_ADDRESS:3:10.236.70.199 =C2=A0 SEVERITY:39= 42406:0.25094103813171387 =C2=A0 NET_VERSION:1:8 =C2=A0 HOST_ID:2:cd2a767f-= 3716-4717-9106-52f0380e6184 =C2=A0 TOKENS:15: =C2=A0 Converting it= from epoch=E2=80=A6.. =C2=A0 local@img2116saturn101:~$ date -d @$((148399= 5662276/1000)) Mon Jan=C2=A0 9 21:01:02 UTC 2017 =C2=A0 At the time we wai= ted the 72 hour period before reusing the IP, I=E2=80=99ve not used replace= _address previously. =C2=A0 =C2=A0 From: Sotirios Delimanolis [mailto:sot= odel_89@yahoo.com] Sent: Friday, January 6, 2017 2:38 PM To: User Subject: Logs appear to contradict themselves during bootstrap steps =C2= =A0 We had a node go down in our cluster and its disk had to be wiped. Duri= ng that time, all nodes in the cluster have restarted at least once. =C2= =A0 We want to add the bad node back to the ring. It has the same IP/hostna= me. I follow the steps=C2=A0here=C2=A0for "Adding nodes to an existing clus= ter." =C2=A0 When the process is started up, it reports =C2=A0 A node wit= h address /
already exists, cancelling join. Use cassand= ra.replace_address if you want to replace this node. =C2=A0 I found this e= rror message in theStorageService using theGossiper instance to look up the= node's state. Apparently, the node knows about it. So I followed the instr= uctions and added thecassandra.replace_address system property and restarte= d the process. =C2=A0 But it reports =C2=A0 Cannot replace_address / because it doesn't exist in gossip =C2=A0 So which one is it? Does th= e ring know about it or not? Running "nodetool ring" does show it on all ot= her nodes. =C2=A0 I've seen=C2=A0CASSANDRA-8138=C2=A0andthe conditions are= the same, but I can't understand why it thinks it's not part of gossip. Wh= at's the difference between the gossip check used to make this determinatio= n and the gossip check used for the first error message?=C2=A0Can someone e= xplain? =C2=A0 I've since retrieved the node's id and used it to "nodetool= removenode". After rebalancing, I added the node back and "nodetool cleane= d" up. Everything's up and running, but I'd like to understand what Cassand= ra was doing. =C2=A0 =C2=A0 =C2=A0=20 =20 ------=_Part_1414118_593439352.1483746354437 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I forgot to check nodetool gossipinfo. Still, why does the firs= t check think that the address exists, but the second doesn't?


On Friday, January 6, 2017 1:11 PM, David Berry <dberry@blackberry.com= > wrote:


I= =E2=80=99ve encountered this previously where after removing a node, gossip= info is retained for 72 hours which doesn=E2=80=99t allow the IP to be reu= sed during that period.   You can check how long gossip will retain this information using =E2=80=9Cnodetool gossi= pinfo=E2=80=9D where the epoch time will be shown with status
= =20
&n= bsp;
=20
For= example=E2=80=A6.
=20
&n= bsp;
=20
Nodetool gossipinfo
=20
 
=20
/10.236.70.199
=20
  generation:1482436691
=20
  heartbeat:3942407
=20
  STATUS:3942404:LEFT,3074457345618261000,1483995662276
=20
  LOAD:3942267:3.60685807E8
=20
  SCHEMA:223625:acbf0adb-1bbe-384a-acd7-6a466= 09497f1
=20
  DC:20:orion
=20
  RACK:22:r1
=20
  RELEASE_VERSION:4:2.1.16
=20
  RPC_ADDRESS:3:10.236.70.199
=20
  SEVERITY:3942406:0.25094103813171387=
=20
  NET_VERSION:1:8
=20
  HOST_ID:2:cd2a767f-3716-4717-9106-52f0380e6= 184
=20
  TOKENS:15:<hidden>
=20
&n= bsp;
=20
Con= verting it from epoch=E2=80=A6..
=20
&n= bsp;
=20
local@img2116saturn101:~$ date -d @$((148399566227= 6/1000))
=20
Mon Jan  9 21:01:02 UTC 2017
=20
&n= bsp;
=20
At = the time we waited the 72 hour period before reusing the IP, I=E2=80=99ve n= ot used replace_address previously.
=20
&n= bsp;
=20
&n= bsp;
=20
= From: Sotirios Delimanolis [ma= ilto:sotodel_89@yahoo.com]
Sent: Friday, January 6, 2017 2:38 PM
To: User <user@cassandra.apache.org>
Subject: Logs appear to contradict themselves during bootstrap steps=
=20
 
=20
We had a node go down in our cluster and its disk had t= o be wiped. During that time, all nodes in the cluster have restarted at le= ast once.
=20
 
=20
We want to add the bad node back to the ring. It has th= e same IP/hostname. I follow the steps here for "Adding nodes to an existing cluster."=20
 
=20
When the process is started up, it reports
=20
 
=20
A node with address <hostname>/<address> already exists= , cancelling join. Use cassandra.replace_address if you want to replace thi= s node.
=20
 
=20
I found this error message in the StorageService = using the Gossiper instan= ce to look up the node's state. Apparently, the node knows about it. So I f= ollowed the instructions and added the cassandra.replace_address system property and restarted the process.
=20
 
=20
But it reports
=20
 
=20
Cannot replace_address /<address> because it doesn't exist in= gossip
=20
 
=20
So which one is it? Does the ring know about it or not?= Running "nodetool ring" does show it on all other nodes.
=20
 
=20
 
=20
I've since retrieved the node's id and used it to "nodetool removenode= ". After rebalancing, I added the node back and "nodetool cleaned" up. Everything's up and running, but I'd like = to understand what Cassandra was doing.=20
 
=20
 
=20
 
=20


------=_Part_1414118_593439352.1483746354437--