cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ertio Lew <ertio...@gmail.com>
Subject Re: Seeking Schema guidance
Date Tue, 06 Nov 2012 16:31:39 GMT
Thoughts ?


On Tue, Nov 6, 2012 at 3:58 AM, Ertio Lew <ertiop93@gmail.com> wrote:

> I need to store (1)posts written by users, (2)along with activity data by
> other users on these posts & (3) some counters for each post like views
> counts, likes counts, etc. So for each post,  there is 3 category of data
> associated, the original post data which is stored in one CF using single
> row per post, another counters data using 1 row for each post data in
> counters type CF & for activity data, each user stores his own activity
> column for each post he reacted to & also stores activity data of all his
> friends in a dedicated row for every user.
>
>
> So here is my current schema plan :
>
> For Posts:
> -------------
> 1 CF with single row for each post
>
>
> For Counters:
> ------------------
> 1 CF with single row for each post
>
>
> For Activities Data
> ---------------------------
>
> 1 CF with single row for each user
>
>
>
> Now for showing the post at anytime I need to have all the 3 categories of
> data so I'm forced to read 3 CFs. So I have been wondering why I shouldn't
> be trying to merge this data into a single CF as materialized view in
> single row so that read queries could be made more efficiently.
>
> Here is the idea I have got:
>
> For each post I would be storing the post data (written once never updated
> type)+ activities data of all users on that post (written for each user at
> different times & may be edited many times) in a 'single row'. Using
> the activities data of all users I can calculate all the counters data(by
> iterating over activity columns), so I don't need to store that explicitly.
> So now for reading some 10 posts at a time, I just need to read 10 rows.
> Also I set a reasonable limit on no of columns to read so that if the post
> counters are too big I don't have to read all column, then in that (less
> often)cases I perform a second query to read the counters from another CF.
> So for most of the time I would enjoy reading from single CF & single row
> for each post. But another issue is that since that single row will contain
> activity of several users (each column added at different times to row) so
> that row might go in many SSTtables.  So which is a good schema for me 1st
> one or 2nd with respect to performance ?
>
> Thanks.
>
>
>
>
>
>
>
>

Mime
View raw message