cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matteo Caprari <>
Subject schema design question
Date Mon, 08 Mar 2010 12:18:16 GMT

We have a collection operation that generates documents like this:

item: {
 "id": "<unique item id>",
"title": "...",
"liked_by": ["user_2", "user_3", ...]

The liked_by list contains on average 100 unique users. Users may also
appear in other items.

Our database contains a few million entries and is growing at about 1M a day.
Around 10% of the incoming data is additional info about an item (ie:
more likers) and a merge operation needs to be done.

We are not too happy with our current system and are considering cassandra.

I'm new to this kind of db, and I'd like to hear a few informed
opinions on how to design a cassandra schema.
Of course we wish the system to keep up with the write/update rate and
answer our key queries 'as quickly as possible'.

The 'key' queries are:
- list all the items a user liked
- list all the users that liked an item
- list all users and count how many items each user liked
(we need this every few hours and in fact we are only interested in
the top N users that liked most stuff)

:Matteo Caprari

View raw message