On Tue, Jul 14, 2009 at 3:00 AM, Mark Robson <firstname.lastname@example.org>
My guess would be:
I have 3 productions servers, is it better to
A. start the cassandra in one node and add other seeds later
B. Start cassandra in all the 3 nodes
if i do A, when i later add 2 nodes ,will cassandra pick up the other two nodes and start distributing the loads fairly
1. If you only have 3 production servers, Cassandra may not do much for you. You will probably only care if you have lots more servers. 3 servers is a reasonable minimum for a test / dev environment.
2. All of your servers should have static IPs. You should make sure that at least 2-3 of them are unlikely to go away, and put those in as seeds, the other servers can come and go and change IP address etc.
I would set up 2-3 servers which I expected to be unlikely to go away (i.e. they won't be taken out any time soon), and code their IPs into the seeds. The other servers can use those to find each other.
Also your ops team should then be aware, that if they get rid of those "seed" servers, at some point new boxes should be deployed to take over those IPs so there are always at least two actively running Cassandra, that way your other nodes can find one another.
Having only one seed server would place a single point of failure, which you don't want.
If you have a segmented network (e.g. routed, different racks, different datacentres with VPN between them etc), you could put two seeds in each segment, which would make discovery tolerant of a partition.
But having said that, it's relatively inconvenient to have a large number of seeds as you'd need to keep deploying new config files to all your nodes.