Hi, Patricio,†

It's hard to comment on your original questions without knowing details of your own domain specific data model and data processing expectation.†

W.R.T. lumping things into one big row, there is a limitation on data model in Cassandra. You got CF and SCF. That is, you have only 2 level of nesting at most for an atomic value update. †I.e. you cannot lump†arbitrarily complex†data into a single big row. †

Even as the update for one particular row is atomic, you would run into the situation of having concurrent read-write operations that conflict with each other.†

For example, having a list of values as one of your column value.†
Old value is: "a, b, c"†
And, the operation is: you want to add "d" to that list.†
The desired new value is: "a, b, c, d"
If there is another concurrent operation that tries to add "e" to the list, you would still have problem given the present atomic semantic of row update in cassandra.†

On the other hand, there are a number of application scenario, where update operations are safe to be considered as idempotent.†
E.g. bulk loading data from flat files into Cassandra

If your main worry is about client process crashing, regardless what kind of ACID properties that Cassandra can provide, you still want to have a way to verify whether Cassandra has stored the desired state and/or log the processed update operation in the context of bulk loading. Then, you can decide whether a particular data update needs to be repeated or not. A full fledge ACID database ("all or nothing semantic") can decrease the complexity of verification of the succeed of storage. But, it cannot remove that concern completely. Consider the case that the client process crashes right at the moment of "dbConn.commit()". You still don't know for sure whether that update operation has gone through.†

Hope this email helps.†


Alex Yiu

On Tue, Jul 20, 2010 at 2:03 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
2010/7/20 Patricio EchagŁe <patricioe@gmail.com>:
> Would it be bad design to store all the data that need to be
> consistent under one big key?

That really depends how unnatural it is from a query perspective. :)

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support