2009/12/3 Coe, Robin <robin.coe@bluecoat.com>

So, considering that I currently have to take down a node to make a CF change, I'm wondering how to perform automatic failover from my application?  Is there a mechanism by which I can request from Cassandra all the destination IP:ports for the nodes in a cluster, so I can adapt dynamically?  For example, if I ramp up/down Cassandra instances based on server load, I would like my application to automatically know what servers are available, to execute automatic reconnection when the node I'm connected to goes down.

This is a bigger question and one which would merit some dicussion generally.

Nodes will need to be taken down, not just for CF changes but any other operational reason, during which time you won't want to have an outage.

Applications querying data will need to still be able to do it, and those inserting will also need to be able to continue to insert (or handle a backlog if that is acceptable to the end-users).

My suggestions are:

1. Use a IP-layer load balancer like LVS and have the servers add/remove themselves from the pool as they are up/down
2. have all your app servers also be a Cassandra node, and always connect locally. If the local Cassandra instance is unhealthy, remove the whole app server from the LVS pool.

Of course you don't need to use LVS, any other IP-based load balancer would do. And of course, Cassandra itself needs a fixed non-changing address per node, so it would need to make sure it didn't use that address.

In the event of a normal (i.e. administrative) shutdown, the admin could manually set the node down before doing the maintenance.

I did some work on an experimental load balancer I call "Fluffy Cluster" here:

http://code.google.com/p/fluffy-linux-cluster/

This is not production-ready yet but could be useful.

Mark