cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: SuperColumn vs range of Columns
Date Sun, 13 Sep 2009 02:12:36 GMT
On Thu, Sep 10, 2009 at 7:57 PM, Matt Corgan <> wrote:
> 1) What is the difference between a super-column like:
> homeAddress: {
>  street: “1234 x street”,
>  city: “san francisco”,
>  zip: “94107″,
> }
> and the BigTable or HBase style of concatenating nested keys together
> into something like:
> homeAddress/street:”1234 x street”,
> homeAddress/city: “san francisco”,
> homeAddress/zip: “94017″
> Wouldn’t they be sorted the same way on disk and be similarly
> efficient for range queries?  Is it that you avoid storing the string
> “homeAddress” redundantly?

[Note that in Cassandra we refer to column "names" to avoid confusion
w/ row "keys."]

This is primarily useful when your column set is not fixed.  Cassandra
can currently handle up to a million or so columns without problems,
and with a little work could handle billions.  So treating a row as an
associative array with dynamic column names that are determined at
runtime is a totally legitimate thing to do.  So if you are storing
"objects" like address data, a supercolumn maps more closely to what
you would think of in an OO language as

Map<String, Address> addresses

rather than having to treat each field separately:
Map<String, String> streets
Map<String, String> cities
Map<String, String> zip

Besides being a more natural fit for the data, your row-level index of
column names is much more effective when related data is grouped like
this, than when you repeat the name N times for N fields.

> 2) Can SuperColumns only add one level of nesting beyond normal
> columns? That seems limiting considerng BigTable and HBase can append
> an arbitrary number of nested keys together.

Yes, only one level of nesting.

Remember, column names are just a byte[].  You can still smush column
names together if you want to.  You don't need my permission. :)

(Although needing more than one level of nesting is often a sign you
should rethink your row model.)

> 3) Can you update the columns in the row of a supercolumn without
> overwriting the whole row?


> Similarly,
> can you read a fraction of a SuperColumn without pulling the whole
> thing to the client?



View raw message