cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sotirios Delimanolis <sotodel...@yahoo.com>
Subject Re: Logs appear to contradict themselves during bootstrap steps
Date Fri, 06 Jan 2017 23:45:54 GMT
I forgot to check nodetool gossipinfo. Still, why does the first check think that the address
exists, but the second doesn't? 

    On Friday, January 6, 2017 1:11 PM, David Berry <dberry@blackberry.com> wrote:
 

 #yiv4782259727 #yiv4782259727 -- _filtered #yiv4782259727 {panose-1:2 4 5 3 5 4 6 3 2 4;}
_filtered #yiv4782259727 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv4782259727
{font-family:Georgia;panose-1:2 4 5 2 5 4 5 2 3 3;}#yiv4782259727 #yiv4782259727 p.yiv4782259727MsoNormal,
#yiv4782259727 li.yiv4782259727MsoNormal, #yiv4782259727 div.yiv4782259727MsoNormal {margin:0in;margin-bottom:.0001pt;font-size:12.0pt;}#yiv4782259727
h2 {margin-top:34.5pt;margin-right:0in;margin-bottom:10.5pt;margin-left:0in;font-size:15.0pt;color:#143470;font-weight:normal;}#yiv4782259727
a:link, #yiv4782259727 span.yiv4782259727MsoHyperlink {color:blue;text-decoration:underline;}#yiv4782259727
a:visited, #yiv4782259727 span.yiv4782259727MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv4782259727
p.yiv4782259727msonormal0, #yiv4782259727 li.yiv4782259727msonormal0, #yiv4782259727 div.yiv4782259727msonormal0
{margin-right:0in;margin-left:0in;font-size:12.0pt;}#yiv4782259727 span.yiv4782259727EmailStyle18
{color:windowtext;}#yiv4782259727 span.yiv4782259727Heading2Char {color:#143470;}#yiv4782259727
span.yiv4782259727z-TopofFormChar {display:none;}#yiv4782259727 span.yiv4782259727z-BottomofFormChar
{display:none;}#yiv4782259727 .yiv4782259727MsoChpDefault {font-size:10.0pt;} _filtered #yiv4782259727
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv4782259727 div.yiv4782259727WordSection1 {}#yiv4782259727
I’ve encountered this previously where after removing a node, gossip info is retained for
72 hours which doesn’t allow the IP to be reused during that period.   You can check how
long gossip will retain this information using “nodetool gossipinfo” where the epoch time
will be shown with status    For example….    Nodetool gossipinfo    /10.236.70.199  
generation:1482436691   heartbeat:3942407   STATUS:3942404:LEFT,3074457345618261000,1483995662276
  LOAD:3942267:3.60685807E8   SCHEMA:223625:acbf0adb-1bbe-384a-acd7-6a46609497f1   DC:20:orion
  RACK:22:r1   RELEASE_VERSION:4:2.1.16   RPC_ADDRESS:3:10.236.70.199   SEVERITY:3942406:0.25094103813171387
  NET_VERSION:1:8   HOST_ID:2:cd2a767f-3716-4717-9106-52f0380e6184   TOKENS:15:<hidden>
   Converting it from epoch…..    local@img2116saturn101:~$ date -d @$((1483995662276/1000))
Mon Jan  9 21:01:02 UTC 2017    At the time we waited the 72 hour period before reusing
the IP, I’ve not used replace_address previously.       From: Sotirios Delimanolis [mailto:sotodel_89@yahoo.com]
Sent: Friday, January 6, 2017 2:38 PM
To: User <user@cassandra.apache.org>
Subject: Logs appear to contradict themselves during bootstrap steps    We had a node go
down in our cluster and its disk had to be wiped. During that time, all nodes in the cluster
have restarted at least once.    We want to add the bad node back to the ring. It has the
same IP/hostname. I follow the steps here for "Adding nodes to an existing cluster."   
When the process is started up, it reports    A node with address <hostname>/<address>
already exists, cancelling join. Use cassandra.replace_address if you want to replace this
node.    I found this error message in theStorageService using theGossiper instance to look
up the node's state. Apparently, the node knows about it. So I followed the instructions and
added thecassandra.replace_address system property and restarted the process.    But it reports
   Cannot replace_address /<address> because it doesn't exist in gossip    So which
one is it? Does the ring know about it or not? Running "nodetool ring" does show it on all
other nodes.    I've seen CASSANDRA-8138 andthe conditions are the same, but I can't understand
why it thinks it's not part of gossip. What's the difference between the gossip check used
to make this determination and the gossip check used for the first error message? Can someone
explain?    I've since retrieved the node's id and used it to "nodetool removenode". After
rebalancing, I added the node back and "nodetool cleaned" up. Everything's up and running,
but I'd like to understand what Cassandra was doing.          

   
Mime
View raw message