You mention a “stored ring view”. Can it be that this stored ring view was out of sync with the actual (gossip) situation?
After checking the code, not as much as I thought it did :)
Stored ring state is just the map from ip address to token, I thought it has a little more in there.
On 2/02/2012, at 9:44 PM, Rene Kochen wrote:
A restart of node1 fixed the problem. The only thing I saw in the log of node1 before the problem was the following: InetAddress /172.27.70.135 is now dead. InetAddress /172.27.70.135 is now UP After this, the nodetool ring command showed node 172.27.70.135 as dead. You mention a “stored ring view”. Can it be that this stored ring view was out of sync with the actual (gossip) situation? From: aaron morton [mailto:firstname.lastname@example.org]
Sent: woensdag 1 februari 2012 21:03
Subject: Re: Node down
Without knowing too much more information I would try this…
* Restart node each node in turn, watch the logs to see what it says about the other.
* If that restart did not fix it, try using the Dcassandra.load_ring_state=false JVM option when starting the node. That will tell it to ignore it's stored ring view and use what gossip is telling it. Add it as a new line at the bottom of cassandra-env.sh.
If it's still failing watch the logs and see what it says when it marks the other as been down.
On 1/02/2012, at 11:12 PM, Rene Kochen wrote:
I have a cluster with seven nodes. If I run the node-tool ring command on all nodes, I see the following: Node1 says that node2 is down. Node 2 says that node1 is down. All other nodes say that everyone is up. I see no network related problems. Also no problems between node1 and node2.