I'm pretty new to Cassandra and am currently doing a proof of concept, and thought it would be a good idea to ask if my data model is sane . . . 

The data I have, and need to query, is reasonably simple. It consists of about 10 million entities, each of which have a set of key/value properties for each day for about 10 years. The number of keys is in the 50-100 range and there will be a lot of overlap for keys in <entity,days>

The queries I need to make are for sets of key/value properties for an entity on a day, e.g key1,keys2,key3 for 10 entities on 20 days. The number of entities and/or days in the query could be either very small or very large.

I've modeled this with a simple column family for the keys with the row key being the concatenation of the entity and date. My first go, used only the entity as the row key and then used a supercolumn for each date. I decided against this mostly because it seemed more complex for a gain I didn't really understand.

Does this seem sensible ?



Franc Carter | Systems architect | Sirca Ltd

franc.carter@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215