I'm evaluating Cassandra and really like its simplicity and ease of maintenance compared to my other alternative which is HBase. While comparing the two products a few questions came up which I couldn't find answers to:
  1. Is the procedure described in the description of ticket CASSANDRA-44 really the way to do schema changes in the latest release? I'm not sure what's your thoughts about this but our experience is that every release of our software requires schema changes because we add new column families for indexes.
    Just as a note. The HBases way of managing schema seems to make much more sense than keeping it in configuration files that have to be deployed to all the nodes in the cluster. Any idea on the timeframe for 0.7?
  2. Our application needs a lot of range scans. Is there anything being done to improve the poor range scan performance as reflected here: http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf ?
  3. What is the reason for the replication strategy with two DCs? As far as I understand it means that only one replica will exist in the second DC. It also means that quorum reads will fail when attempted on the second DC while the first DC is down. Am I missing something?
  4. Are there any plans to have a inter-cluster replication option? I mean having two clusters running in two DCs, each will be stand alone but they will replicate data between themselves. This can avoid the problem mentioned above, as well as avoid the high cost of inter-DC traffic when doing Read-Repairs for every read.
  5. From everything I've read I didn't understand if load balancing is local or global. In other words, what happens exactly when a new node is added? Will it only balance its two neighbors on the ring or will the re-balance propagate through the ring and all the nodes will be rebalanced evenly?
  6. I see that Hadoop support is coming in 0.6 but from following the ticket on Jira (CASSANDRA-342) I didn't understand if it will support the orderPreservingPartitioner or not.
  7. Do the clients have to be recompiled and deployed when a new version of Cassandra is deployed, or are new releases backward compatible?
Thanks for your help.

Eran Kutner,