cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyler Hobbs <ty...@riptano.com>
Subject Re: Building a News-feed that comprises posts “created by user's connections” && “on the topics user is following”
Date Mon, 10 Jan 2011 15:20:56 GMT
> Actually the person who is posting will not be restricted by any limit on
> the kind of topics he could post on. He may even post on the topics beyond
> the list of what he himself is following. Thus this list cannot be defined
> from earlier and since the topics list would comprise in hundreds perhaps
> hence it would not be possible to implement this here.
> Actually the database would know defined & limited list of the topics user
> is following but not any restrictions on what he could post on.
>
> Even if you think alternatively "putting all the topics that his followers
> are following"  as categories for splitting the rows of users followers
> according to topics this would be ways toooo.o much of denormalizing.
> Although it may reduce the pain in frequent operations but may increase too
> much pain somewhere else.
>

Yes, I was describing:

For every user, create one row for every topic and populate each of those
with that user's followers who are interested in the topic. Note that if a
user does not have any followers who are interested in a topic, that row
will have no columns, so the row won't exist.

The amount of denormalization is not excessive here. I would guess that you
have to store roughly 10x as much information about followers, and followers
are a small amount of data compared to posts and timelines.


> What do you think about the JSON encoded columns(that contain list of topic
> tags & corresponding postID) as I referred above, although this does put
> some pressure on reads but still sounds quite (ok?).  Let me know your
> views.
>

The method you described is not bad, but it does have some downsides.
First, you will potentially be appending posts to users' timelines who are
not interested in the topic.  As you say, you will have to resolve this at
read-time by checking the user's interested topics and filtering the
timeline.  This means "getting the last 10 posts" may take more than one
read if you don't get more than ten posts after filtering (supposing you get
>10 posts from the timeline). You will also have to read the user's list of
interested topics.

Second, doing a per-topic timeline becomes painful either at the time of
post creation or when reading the user's timeline.

You always want to denormalize and write more data if it means you can make
fewer reads elsewhere (within reasonable limits).  Remember, writes are 10x
faster than reads and disk space is cheap.

- Tyler

Mime
View raw message