incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Revised: Data Modeling advise for Cassandra 0.8 (added #8)
Date Thu, 31 Mar 2011 12:53:21 GMT
Drew, 
	The Twissandra project is a twitter clone in cassandra, it may give you some insight into
how things can be modelled https://github.com/thobbs/twissandra

	If you are just starting then consider something like...

	- CF to hold the user, their data and their network links  
	- standard CF to hold a blog entry, key is a timestamp 
	- standard CF to hold blog comments, each comment as a single column where the name is a
long timestamp 
	- standard CF to hold the blogs for a user, key is the user id and each column is the blog
key 

Thats not a great schema but it's a simple starting point you can build on and refine using
things like secondary indexes and doing more/less in the same CF. 

Good luck. 
Aaron

On 30 Mar 2011, at 15:13, Drew Kutcharian wrote:

> I'm pretty new to Cassandra and I would like to get your advice on modeling. The object
model of the project that I'm working on will be pretty close to Blogger, Tumblr, etc. (or
any other blogging website).
> Where you have Users, that each can have many Blogs and each Blog can have many comments.
How would you model this efficiently considering:
> 
> 1) Be able to directly link to a User
> 2) Be able to directly link to a Blog
> 3) Be able to query and get all the Blogs for a User ordered by time created descending
(new blogs first)
> 4) Be able to query and get all the Comments for each Blog ordered by time created ascending
(old comments first)
> 5) Be able to link different Users to each other, as a network.
> 6) Have a well distributed hash so we don't end up with "hot" nodes, while the rest of
the nodes are idle
> 7) It would be nice to show a User how many Blogs they have or how many comments are
on a Blog, without iterating thru the whole dataset.
> NEW: 8) Be able to query for the most recently added Blogs. For example, Blogs added
today, this week, this month, etc.
> 
> The target Cassandra version is 0.8 to use the Secondary Indexes. The goal is to be very
efficient, so no Text keys. We were thinking of using Time Based 64bit ids, using Snowflake.
> 
> Thanks,
> 
> Drew


Mime
View raw message