cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ertio Lew <>
Subject Seeking Schema guidance
Date Mon, 05 Nov 2012 22:28:37 GMT
I need to store (1)posts written by users, (2)along with activity data by
other users on these posts & (3) some counters for each post like views
counts, likes counts, etc. So for each post,  there is 3 category of data
associated, the original post data which is stored in one CF using single
row per post, another counters data using 1 row for each post data in
counters type CF & for activity data, each user stores his own activity
column for each post he reacted to & also stores activity data of all his
friends in a dedicated row for every user.

So here is my current schema plan :

For Posts:
1 CF with single row for each post

For Counters:
1 CF with single row for each post

For Activities Data

1 CF with single row for each user

Now for showing the post at anytime I need to have all the 3 categories of
data so I'm forced to read 3 CFs. So I have been wondering why I shouldn't
be trying to merge this data into a single CF as materialized view in
single row so that read queries could be made more efficiently.

Here is the idea I have got:

For each post I would be storing the post data (written once never updated
type)+ activities data of all users on that post (written for each user at
different times & may be edited many times) in a 'single row'. Using
the activities data of all users I can calculate all the counters data(by
iterating over activity columns), so I don't need to store that explicitly.
So now for reading some 10 posts at a time, I just need to read 10 rows.
Also I set a reasonable limit on no of columns to read so that if the post
counters are too big I don't have to read all column, then in that (less
often)cases I perform a second query to read the counters from another CF.
So for most of the time I would enjoy reading from single CF & single row
for each post. But another issue is that since that single row will contain
activity of several users (each column added at different times to row) so
that row might go in many SSTtables.  So which is a good schema for me 1st
one or 2nd with respect to performance ?


View raw message