I am assuming here you want to sync all the 100s of nodes once the application is airborne. I suspect this would flood the network and even potentially affect the machine itself memory-wise. How are you going to maintain the nodes (compaction+repair)?



-----Original Message-----
From: Emalayan Vairavanathan <svemalayan@yahoo.com>
To: user <user@cassandra.apache.org>
Sent: Wed, May 22, 2013 8:31 pm
Subject: Creating namespace and column family from multiple nodes concurrently

Hi all,

I am implementing a distributed application which runs on 100s of machines concurrently. This application is going to use Cassandra as underlaying storage.

The application creates the schema (name space and column families) during initialization phase.  It seems I have two options to create the schema.

Option - 1 : Using a single node for schema creation.
        Option - 2: Having all the nodes (> 100) to run the same schema creation logic (First, nodes will check whether the schema is already available and then try to create the schema if it is not available already).  

To keep the initialization phase simple, I prefer to go for Option - 2. However I am not sure how Cassandra is going to behave if multiple nodes try to create the same schema (namespace and column families) concurrently. It would be nice if someone can tell me about the implications of Option - 2 with Cassandra version 1.2.2.

Please let me know if you have question.

Thank you