incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Kutcharian <d...@venarc.com>
Subject Re: Revised: Data Modeling advise for Cassandra 0.8 (added #8)
Date Thu, 31 Mar 2011 17:51:58 GMT
Thanks Aaron,

I have already checked out Twissandra. I was mainly looking to see how Secondary Indexes can
be used and how they effect Data Modeling. There doesn't seem to be a lot of coverage on them.

In addition, I couldn't tell what kind of Partitioner is Twissandra using and why.

cheers,

Drew


On Mar 31, 2011, at 5:53 AM, aaron morton wrote:

> Drew, 
> 	The Twissandra project is a twitter clone in cassandra, it may give you some insight
into how things can be modelled https://github.com/thobbs/twissandra
> 
> 	If you are just starting then consider something like...
> 
> 	- CF to hold the user, their data and their network links  
> 	- standard CF to hold a blog entry, key is a timestamp 
> 	- standard CF to hold blog comments, each comment as a single column where the name
is a long timestamp 
> 	- standard CF to hold the blogs for a user, key is the user id and each column is the
blog key 
> 
> Thats not a great schema but it's a simple starting point you can build on and refine
using things like secondary indexes and doing more/less in the same CF. 
> 
> Good luck. 
> Aaron
> 
> On 30 Mar 2011, at 15:13, Drew Kutcharian wrote:
> 
>> I'm pretty new to Cassandra and I would like to get your advice on modeling. The
object model of the project that I'm working on will be pretty close to Blogger, Tumblr, etc.
(or any other blogging website).
>> Where you have Users, that each can have many Blogs and each Blog can have many comments.
How would you model this efficiently considering:
>> 
>> 1) Be able to directly link to a User
>> 2) Be able to directly link to a Blog
>> 3) Be able to query and get all the Blogs for a User ordered by time created descending
(new blogs first)
>> 4) Be able to query and get all the Comments for each Blog ordered by time created
ascending (old comments first)
>> 5) Be able to link different Users to each other, as a network.
>> 6) Have a well distributed hash so we don't end up with "hot" nodes, while the rest
of the nodes are idle
>> 7) It would be nice to show a User how many Blogs they have or how many comments
are on a Blog, without iterating thru the whole dataset.
>> NEW: 8) Be able to query for the most recently added Blogs. For example, Blogs added
today, this week, this month, etc.
>> 
>> The target Cassandra version is 0.8 to use the Secondary Indexes. The goal is to
be very efficient, so no Text keys. We were thinking of using Time Based 64bit ids, using
Snowflake.
>> 
>> Thanks,
>> 
>> Drew
> 


Mime
View raw message