cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aklin_81 <asdk...@gmail.com>
Subject Re: Building a News-feed that comprises posts “created by user's connections” && “on the topics user is following”
Date Mon, 10 Jan 2011 12:44:51 GMT
Thanks so much Tyler.

Actually the person who is posting will not be restricted by any limit on
the kind of topics he could post on. He may even post on the topics beyond
the list of what he himself is following. Thus this list cannot be defined
from earlier and since the topics list would comprise in hundreds perhaps
hence it would not be possible to implement this here.
Actually the database would know defined & limited list of the topics user
is following but not any restrictions on what he could post on.

Even if you think alternatively "putting all the topics that his followers
are following"  as categories for splitting the rows of users followers
according to topics this would be ways toooo.o much of denormalizing.
Although it may reduce the pain in frequent operations but may increase too
much pain somewhere else.

What do you think about the JSON encoded columns(that contain list of topic
tags & corresponding postID) as I referred above, although this does put
some pressure on reads but still sounds quite (ok?).  Let me know your
views.

Thanks again.



On Mon, Jan 10, 2011 at 9:54 AM, Tyler Hobbs <tyler@riptano.com> wrote:

> I also posted this to StackOverflow, but I'll post here as well.
>
> ===
>
> I'm assuming you've already studied the Twissandra example application.
> It's very close to what you're describing. Here are a couple of useful
> links:
>
>    - Twissandra github project page<https://github.com/ericflo/twissandra>
>    - Riptano documentation on Twissandra<http://www.riptano.com/docs/0.6/data_model/twissandra>
>
> The primary difference with your application is the introduction of topics.
> How you store the data depends on exactly how you want to be able to query
> it. For example, you might be fine with all topics being presented in the
> same timeline, or you might want to be able to see a timeline only for a
> specific topic (like SO tags, for example).
>
> If you don't need separate timelines, I recommend the following, using the
> Twissandra data model as the base:
>
> Instead of the normal FOLLOWERS column family, maintain one row of
> followers for every user for *each* topic. Obviously, this causes a little
> extra work when creating/altering/dropping users, but it saves you work when
> new posts are created, which is the bulk of the operations you need to
> handle.
>
> When a post is made by user Joe on topics A, B, and C, you'll be able to
> get all of the interested users with a query like:
>
> multiget(FOLLOWERS, ['Joe::A', 'Joe::B', 'Joe::C'])
>
> where 'Joe::A', 'Joe::B', and 'Joe::C' are row keys. For each of the
> followers that you get back, you can simply add the post's UUID as a column
> name to each follower's timeline (and you won't have to worry about
> duplicates in the timeline since you're using the same UUID for the column
> name).
>
> If you want to be able to support per-topic timelines for each user, I
> suggest you use one row for each topic that a user is interested in and one
> row for the all-topics timeline. Since you are already fetching followers by
> topic, it's easy to know which topic(s) the post has that the followers are
> interested in, it's to append the post to the correct per-topic timelines.
> - Tyler
>
>
> On Sun, Jan 9, 2011 at 9:28 PM, Aklin_81 <asdkl93@gmail.com> wrote:
>
>> I could think of one way as follows:
>>
>> Initially writing to all followers about the posts from their network, by
>> adding a column to the rows of all followers, with name as timestamp(for
>> sort by time) and value using a JSON that contains two attributes:
>> #PostIdKey and <list of tags of this post>.
>> At the read time, compare <list of the tags of this post> with the topics
>> user is following, if they match then show the post. But this will ofcourse
>> increase the pressure during reads, which should better have been towards
>> writes.
>>
>> Please suggest any better way that you could think of..
>>
>> Thanks
>>
>>
>> On Mon, Jan 10, 2011 at 12:11 AM, Aklin_81 <asdkl93@gmail.com> wrote:
>>
>>> I am working on a project of Questions & Answers website that allows a
>>> user to follow questions on certain topics from his network.
>>>
>>> I want to build user's news-feed wall that comprises of only those
>>> questions that have been posted by his connections and tagged on the topics
>>> that he is following(his expertise topics).
>>>
>>> After my study of Cassandra I realized that Simple news-feed design that
>>> shows all the posts from network would be easy to design using Cassandra by
>>> executing fast writes to all followers of a user about the post from user.
>>> But for my kind of application where there is an additional filter of
>>> 'followed topics', (ie, the user receives posts "created by his network" &&
>>> "on topics user is following"), I could not convince myself with a good
>>> schema design in Cassandra. I hope if I missed something because of my short
>>> understanding of cassandra, perhaps, can you please help me out with your
>>> suggestions of schema/ how this news-feed could be implemented in Cassandra
>>> ?
>>>
>>> Many thanks !
>>>
>>>
>>
>

Mime
View raw message