Thanks for the quick response!. I will reconsider the schema.

However, the problem troubles me somehow. How are schema changes supposed to be done? Should I serialize them, should I halt other cluster operations while I do the schema change? Is this a known problem with cassandra?

The other question, and I think the more important one for me now: how do I repair the cluster without loosing data once the schemas diverge? Right now the only way I have is erase all data and have the cluster start empty. Should this problem ever happen in production, it's important there's a way to recover the data.

On Fri, Apr 15, 2011 at 1:57 PM, Dan Hendry <> wrote:

Uh... don’t create a column family per user. Column families are meant to be fairly static; conceptually equivalent to a table in a relational database. Why do you need (or even want) a CF per user? Reconsider your data model, a single column family with an inverted index for a ‘user’ column is probably more what you are looking for. Operationally, the fewer CFs the better.




From: Alejandro Perez []
Sent: April-15-11 16:39
Cc: Support
Subject: Schemas diverging while dynamically creating CF.




We're testing cassandra for integration with indextank. In this first try, we're creating one column family for each user. In practice, on the first run and for the first few documents (a few 100s), a new CF is created, and a document is immediately added to it. A few (up to 50) requests of this type are issued in parallel (for different column families).


The end result, and quite repeatable, is having the cluster split with different schema versions, and they never agree.


Any thoughts?







Alejandro Perez

follow us @indextank | read our blog | subscribe our user mailing list

No virus found in this incoming message.
Checked by AVG -
Version: 9.0.894 / Virus Database: 271.1.1/3574 - Release Date: 04/15/11 02:34:00

Alejandro Perez

follow us @indextank | read our blog | subscribe our user mailing list