cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <aa...@thelastpickle.com>
Subject Re: Schema Question
Date Tue, 25 Jan 2011 00:39:35 GMT
Sam, 
The best advice is to jump in and try any schema. If you are just starting out, start simple
you're going to re-write it several times. Worry about scale later, in most cases it's going
to work. 

Some general points:

- do not create CF's on the fly. 
- work out your common read requests and denormalise to support these, the writes will be
fast enough. 
- try to get each read request to be resolved by reading from a single CF (not a rule, just
a guideline)
- avoid big super columns. 
- this may also be interesting http://www.rackspacecloud.com/blog/2010/05/12/cassandra-by-example/

If you are happy with the one in the article start with that and see how it works with you
app. See how it works for your read activities. 

Hope that helps. 
Aaron


On 25 Jan, 2011,at 12:47 PM, Sam Hodgson <hodgson_sam@hotmail.com> wrote:

Hi all, 

Im brand new to Cassandra - im migrating from MySql for a large forum site and would be grateful
if anyone can give me some basic pointers on schema design, or any recommended documentation. 


The example used in http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model is very
close if not exactly what I need for my main CF:
<!--
    ColumnFamily: BlogEntries
    This is where all the blog entries will go:

    Row Key +> post's slug (the seo friendly portion of the uri)
    Column Name: an attribute for the entry (title, body, etc)
    Column Value: value of the associated attribute

    Access: grab an entry by slug (always fetch all Columns for Row)

    fyi: tags is a denormalization... its a comma separated list of tags.
    im not using json in order to not interfere with our
    notation but obviously you could use anything as long as your app
    knows how to deal w/ it

    BlogEntries : { // CF
        i-got-a-new-guitar : { // row key - the unique "slug" of the entry.
            title: This is a blog entry about my new, awesome guitar,
            body: this is a cool entry. etc etc yada yada
            author: Arin Sarkissian  // a row key into the Authors CF
            tags: life,guitar,music  // comma sep list of tags (basic denormalization)
            pubDate: 1250558004      // unixtime for publish date
            slug: i-got-a-new-guitar
        },
        // all other entries
        another-cool-guitar : {
            ...
            tags: guitar,
            slug: another-cool-guitar
        },
        scream-is-the-best-movie-ever : {
            ...
            tags: movie,horror,
            slug: scream-is-the-best-movie-ever
        }
    }
-->
<ColumnFamily CompareWith="BytesType" Name="BlogEntries"/>

How well would this scale? Say you are storing 5 million posts and looking to scale that up

would it be better to segment them into several column families and if so to what extent?


I could create column families to store posts for each category however i'd end up with thousands
of CF's.  
Saying that the data would then be stored in a very sorted manner for querying/presenting.

My db is very write heavy and growing fast, Cassandra sounds like the best solution.
Any advice is greatly appreciated!! 

Thanks

Sam


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message