incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <pco...@cegetel.net>
Subject Cassandra DataModeling recommendations
Date Tue, 29 Nov 2011 10:02:08 GMT
Hi all,
In order to evaluate NoSQL solutions and to gain knowledge, I am currently working on a kind
of prototype. 
Here is a brief overview of the scope:

I would like to manage user carts. Lets keep things simple:
A user can have up to n (lets say 3 for example) carts. Each cart will contain metadata and
among them an expiration date and a blob containing stuff (xml in fact but I really don't
care of the content).

A user can save, retrieve or delete his carts. Additionally, a dedicated batch process would
remove carts who are expired.

Basically I was thinking of two ways to model the data:
1- A ColumnFamily with the userid as a key and having several SuperColumns each one describing
a Cart and its content.
This has the advantage that I can get all the Carts in a single get or can do some slice queries
to get only some Carts. The problem is that I cannot if I am right create a secondary index
on the expired date column inside each Cart.
2- A ColumnFamily with a composite key like userid::cartId containing the expiration date
column and the blob. I can in that case create an index to perform a query on the expiration
timestamp. The drawback is that if I want to get all the Carts I need to create either a secondary
ColumnFamily listing the carts associated to a userid or use a kind of OrderPreservingPartitionner
if I want to perform a Key-Range Query.

I made some tests and I had some problems
First I was unable to perform queries in the case 2 like:
get Carts where timestamp < xxxxxxx; The (ugly, really!) workaround was to create a fake
column always set to true and the query that worked was:
get Carts where dummy=true and timestamp < xxxxxxx; But I really dislike this solution
and I am almost sure this is not the right way to go.

I tried something different like creating a dedicated timestamp columnfamily associating a
key based on a timestamp and columns related to user and carts. In that case if I want outdated
entries I could perform a range query on keys of this columnfamily. But again in that case
I need an OrderPreservingPartionner and I fear that using a timestamp as a key would lead
to a bad repartition scheme among the nodes. If I fit to the second proposal (with Standard
Columns), columns could be directly the key like userId::cartId and there is no logic in the
removal process. If I fit to first solution solution, I need to have some logic to analyze
the column key or value to get userid + cartid.
Another point, if I use this column family I have to manage "updates". If for example I replace
Cart2 of user1, I need to remove the corresponding entry and add a new one. This is honestly
probably not the hardest part.

I have the feeling that having a userId based ColumnFamily with SuperColumns inside and a
dedicated timestamp table is the best choice. In fact I think that basically my requests will
be:
- Give me all the carts of a userId
- Remove all the expired carts: which is probably in fact 2 requests: Find all carts whose
expiry date is before a given date. Delete the found stuff.

I am fairly new to NoSQL and especially to Cassandra so I would like to get any advice on:
1- Is Cassandra suited to this kind of storage ? I would say yes
2- What is the right way to model the data and the related constraints.

If my description is unclear or anyone does need more details, do not hesitate to ask
Thanks in advance for any help or advice

Regards

Pascal

Mime
View raw message