Hi all,
I'm testing our procedures for handling some Cassandra failure scenarios and I'm not understanding something.

I'm testing on a 3 node cluster with a replication_factor of 3.
I stopped one of the nodes for 5 or so minutes and run some application tests. Everything was fine.

Then I started cassandra on that node again and it refuses to re-join the ring. It can see itself as up but not the other nodes. The other nodes can see themselves but don't see it as up.

I deliberately haven't followed any of the token replacement methods outlined in the docs. I'm working on the assumption that a small outage on one node shouldn't cause extraordinary action.

Nor do I want to have to stop every node before bringing them up one by one.

What am I missing? Am I forced into those time consuming methods every time I want to restart?




