Hi all,
I'm testing our procedures for handling some Cassandra failure scenarios and I'm not understanding something.

I'm testing on a 3 node cluster with a replication_factor of 3.
I stopped one of the nodes for 5 or so minutes and run some application tests. Everything was fine.

Then I started cassandra on that node again and it refuses to re-join the ring. It can see itself as up but not the other nodes. The other nodes can see themselves but don't see it as up.

I deliberately haven't followed any of the token replacement methods outlined in the docs. I'm working on the assumption that a small outage on one node shouldn't cause extraordinary action.

Nor do I want to have to stop every node before bringing them up one by one.

What am I missing? Am I forced into those time consuming methods every time I want to restart?

Thoughts?

Cheers,
Edward

--

Edward Sargisson

senior java developer
Global Relay

edward.sargisson@globalrelay.net


866.484.6630 
New York | Chicago | Vancouver 
London  (+44.0800.032.9829)  Singapore  (+65.3158.1301)

Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and more. 


Ask about Global Relay MessageThe Future of Collaboration in the Financial Services World


All email sent to or from this address will be retained by Global Relay’s email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law.  Global Relay will not be liable for any compliance or technical information provided herein.  All trademarks are the property of their respective owners.