cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan <cne...@yahoo.com>
Subject RE: Problem Replacing a Dead Node
Date Fri, 22 Apr 2016 02:13:48 GMT
Mir; 

You can take a node out of the cluster with nodetool decommission to a live node, or nodetool
removetoken (to any other machine) to remove a dead one. 
This will assign the ranges the old node was responsible for to other nodes, and replicate
the appropriate data there. If decommission is used, the data will stream from the decommissioned
node. If removetoken is used, the data will stream from the remaining replicas.


Hope this helps
Jan/

--------------------------------------------
On Thu, 4/21/16, Anubhav Kale <Anubhav.Kale@microsoft.com> wrote:

 Subject: RE: Problem Replacing a Dead Node
 To: "user@cassandra.apache.org" <user@cassandra.apache.org>
 Date: Thursday, April 21, 2016, 6:34 PM
 
 #yiv5871637581
 #yiv5871637581 --
  
  _filtered #yiv5871637581 {panose-1:2 4 5 3 5 4 6 3 2 4;}
  _filtered #yiv5871637581 {font-family:Calibri;panose-1:2 15
 5 2 2 2 4 3 2 4;}
 #yiv5871637581  
 #yiv5871637581 p.yiv5871637581MsoNormal, #yiv5871637581
 li.yiv5871637581MsoNormal, #yiv5871637581
 div.yiv5871637581MsoNormal
 	{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;}
 #yiv5871637581 a:link, #yiv5871637581
 span.yiv5871637581MsoHyperlink
 	{color:blue;text-decoration:underline;}
 #yiv5871637581 a:visited, #yiv5871637581
 span.yiv5871637581MsoHyperlinkFollowed
 	{color:purple;text-decoration:underline;}
 #yiv5871637581 span.yiv5871637581EmailStyle17
 	{color:#1F497D;}
 #yiv5871637581 .yiv5871637581MsoChpDefault
 	{}
  _filtered #yiv5871637581 {margin:1.0in 1.0in 1.0in 1.0in;}
 #yiv5871637581 div.yiv5871637581WordSection1
 	{}
 #yiv5871637581 
 
 Reusing the bootstrapping node
 could have caused this, but hard to tell. Since you have
 only 7 nodes, have you tried doing a few rolling restarts of
 all nodes
  to let gossip settle ? Also, the node is pingable from
 other nodes even though it says Unreachable below. Correct
 ? 
    
 Based on nodetool status, it
 appears the node has streamed all the data it needs, but it
 doesn’t think it has joined the ring yet. Does cqlsh work
 on that node
  ?  
    
 From: Mir Tanvir Hossain
 [mailto:mir.tanvir.hossain@gmail.com]
 
 
 Sent: Thursday, April 21, 2016 11:51 AM
 
 To: user@cassandra.apache.org
 
 Subject: Re: Problem Replacing a Dead Node
 
    
 
 Here is a bit more detail
 of the whole situation. I am hoping someone can help me out
 here. 
 
    
 
 
 We have a seven node
 cluster. One the nodes started to have issues but it was
 running. We decided to add a new node, and remove the
 problematic node after the new node joins. However, the new
 node did not join the cluster even after three
  days. Hence, we decided to go with the replacement option.
 We shutdown the problematic node. After that, we stopped
 cassandra on the bootstraping node, deleted all the data,
 and restarted that node as the replacement node for the
 problematic node.  
 
 
    
 
 
 Since, we reused the
 bootstrapping node as the replacement node, I am wondering
 whether that is causing any issue. Any insights are
 appreciated.  
 
 
    
 
 
 This is the output of
 nodetool describecluster from the replacement node, and two
 other nodes. 
 
 
    
 
 
 
 mhossain@cassandra-24:~$
 nodetool describecluster 
 
 
 Cluster Information: 
 
 
            
 Name: App 
 
 
            
 Snitch:
 org.apache.cassandra.locator.DynamicEndpointSnitch 
 
 
            
 Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 
 
 
            
 Schema versions: 
 
 
                        
 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 10.0.7.4,
 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160,
 10.0.7.176] 
 
 
    
 
 
    
 
 
 mhossain@cassandra-13:~$
 nodetool describecluster 
 
 
 Cluster Information: 
 
 
            
 Name: App 
 
 
            
 Snitch:
 org.apache.cassandra.locator.DynamicEndpointSnitch 
 
 
            
 Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 
 
 
            
 Schema versions: 
 
 
                        
 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80,
 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160,
 10.0.7.176] 
 
 
    
 
 
                        
 UNREACHABLE: [10.0.7.91, 10.0.7.4] 
 
 
    
 
 
    
 
 
 mhossain@cassandra-09:~$
 nodetool describecluster 
 
 
 Cluster Information: 
 
 
            
 Name: App 
 
 
            
 Snitch:
 org.apache.cassandra.locator.DynamicEndpointSnitch 
 
 
            
 Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 
 
 
            
 Schema versions: 
 
 
                        
 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80,
 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160,
 10.0.7.176] 
 
 
    
 
 
                        
 UNREACHABLE: [10.0.7.91, 10.0.7.4] 
 
 
    
 
 
    
 
 
 cassandra-24 (10.0.7.4) is
 the replacement node. 10.0.7.91 is the ip address of the
 dead node. 
 
 
    
 
 
 -Mir  
 
 
 
 
    
 
 On Thu, Apr 21, 2016 at 10:02
 AM, Mir Tanvir Hossain <mir.tanvir.hossain@gmail.com>
 wrote: 
 
 
 Hi, I am trying to replace a
 dead node with by following https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html.
  It's been 3 full days since the replacement node
 started, and the node is still not showing up as part of the
 cluster on OpsCenter. I was wondering whether the delay is
 due to the fact that I have a test keyspace with replication
 factor of one? If I delete
  that keyspace, would the new node successfully replace the
 dead node? Any general insight will be hugely
 appreciated.  
 
    
 
 
 Thanks, 
 
 
 Mir 
 
    
 
 
    
 
 
 
 
 
    
 
 

Mime
View raw message