cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Visual representation of Cassandra data model
Date Thu, 13 Aug 2009 03:35:43 GMT
Thanks for taking a stab at this, Mark.

I'm not a fan of teaching this by showing CF-spanning rows.  (The
bigtable paper does this IIRC but it's wrong. :)

You can have data in different CFs with the same key, yes, but all
that means is they will be stored on the same nodes.  Each CF is
stored separately on disk and queried separately and the common case
is that they _won't_ have keys in common, rather than the reverse.


On Wed, Aug 12, 2009 at 10:24 PM, Mark McBride<> wrote:
> Is this clearer?  I had the key names set up as <type>:<id> just to
> keep it simple and put everything in one keyspace.  Ditto the super
> column, although I guess that could be spread out into three things,
> or you could spread it out into three keyspaces.  Not sure what best
> practices there are.
> What I'd like to do (and I'll get started on this tonight) is start
> with a problem statement, and then go about building up a
> storage-conf.xml file with this structure, showing API examples along
> the way.  So while this is a final picture, there would be simpler
> ones up front.
>   ---Mark
> On Wed, Aug 12, 2009 at 5:35 PM, Ryan King<> wrote:
>> A few quick comments:
>> * its not clear what column family the super column you're using is in.
>> * it might be useful to include the timestamps in the columns (since
>> they're user-supplied)
>> * given that the colon-delimited api has been removed, it might be
>> easier to explain the data model without such strings
>> * why would you mix different kinds of data in the same column family,
>> rather than having separate column families for each? (users,
>> bookmarks, tags)
>> -ryan
>> On Wed, Aug 12, 2009 at 4:57 PM, Mark McBride<> wrote:
>>> While working on an updated data model wiki page I'm trying to put
>>> together a graphical representation of the data model.  I threw this
>>> together based on Curt's goal of modeling delicious.  The basic gist
>>> is descriptive data for tags, users, and bookmarks goes in the
>>> Description column family.  The relationships between bookmarks, tags
>>> and users goes in the map supercolumn.  I'm not sure this is how you
>>> would do it in production (I'm guessing at the very least you'd want
>>> separate supercolumns for bookmarks, tags and users), but it seems to
>>> be simple enough for a new user to digest, and covers all the bases of
>>> the data model (aside from ordering I guess).  So two questions
>>> 1) did I get it right (I'm new to this as well)?
>>> 2) is this a useful representation?
>>>  ---Mark

View raw message