cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Corgan <>
Subject SuperColumn vs range of Columns
Date Fri, 11 Sep 2009 00:57:36 GMT
I've been watching some of the Cassandra presentation videos and
looking through slides and the website, but I'm still missing the
motivation behind SuperColumns.

1) What is the difference between a super-column like:

homeAddress: {
  street: “1234 x street”,
  city: “san francisco”,
  zip: “94107″,

and the BigTable or HBase style of concatenating nested keys together
into something like:

homeAddress/street:”1234 x street”,
homeAddress/city: “san francisco”,
homeAddress/zip: “94017″

Wouldn’t they be sorted the same way on disk and be similarly
efficient for range queries?  Is it that you avoid storing the string
“homeAddress” redundantly?  Maybe that really adds up if you’re doing
inbox search and storing billions of doc ids where the column name is
several times the size of the doc id.  Seems like BigTable/HBase could
get a similar benefit by using prefix compression and omitting the

2) Can SuperColumns only add one level of nesting beyond normal
columns? That seems limiting considerng BigTable and HBase can append
an arbitrary number of nested keys together.

3) Can you update the columns in the row of a supercolumn without
overwriting the whole row? For example, if a facebook user sends his
10,000th message with the word Steelers in it, does that mean all
10,000 columns need to be overwritten (something like 100KB), or can a
single column be sqeezed into the front of a supercolumn?  Similarly,
can you read a fraction of a SuperColumn without pulling the whole
thing to the client?

As far as i can tell, the only benefit of a SuperColumn over a bunch
of Columns stored together is the savings you get by not storing the
column name and timestamp over and over?  What am I missing?

Thanks!  (maybe this could be added to an FAQ section on the project wiki)


View raw message