Thoughts ?


On Tue, Nov 6, 2012 at 3:58 AM, Ertio Lew <ertiop93@gmail.com> wrote:
I need to store (1)posts written by users, (2)along with activity data by other users on these posts & (3) some counters for each post like views counts, likes counts, etc. So for each post,  there is 3 category of data associated, the original post data which is stored in one CF using single row per post, another counters data using 1 row for each post data in counters type CF & for activity data, each user stores his own activity column for each post he reacted to & also stores activity data of all his friends in a dedicated row for every user.


So here is my current schema plan :

For Posts:
-------------
1 CF with single row for each post


For Counters:
------------------
1 CF with single row for each post 


For Activities Data
---------------------------
 
1 CF with single row for each user



Now for showing the post at anytime I need to have all the 3 categories of data so I'm forced to read 3 CFs. So I have been wondering why I shouldn't be trying to merge this data into a single CF as materialized view in single row so that read queries could be made more efficiently.

Here is the idea I have got:

For each post I would be storing the post data (written once never updated type)+ activities data of all users on that post (written for each user at different times & may be edited many times) in a 'single row'. Using the activities data of all users I can calculate all the counters data(by iterating over activity columns), so I don't need to store that explicitly. So now for reading some 10 posts at a time, I just need to read 10 rows. Also I set a reasonable limit on no of columns to read so that if the post counters are too big I don't have to read all column, then in that (less often)cases I perform a second query to read the counters from another CF. So for most of the time I would enjoy reading from single CF & single row for each post. But another issue is that since that single row will contain activity of several users (each column added at different times to row) so that row might go in many SSTtables.  So which is a good schema for me 1st one or 2nd with respect to performance ?

Thanks.