On Thu, Sep 10, 2009 at 7:57 PM, Matt Corgan <mcorgan@hotpads.com> wrote:
> 1) What is the difference between a super-column like:
>
> homeAddress: {
> street: “1234 x street”,
> city: “san francisco”,
> zip: “94107″,
> }
>
> and the BigTable or HBase style of concatenating nested keys together
> into something like:
>
> homeAddress/street:”1234 x street”,
> homeAddress/city: “san francisco”,
> homeAddress/zip: “94017″
>
> Wouldn’t they be sorted the same way on disk and be similarly
> efficient for range queries? Is it that you avoid storing the string
> “homeAddress” redundantly?
[Note that in Cassandra we refer to column "names" to avoid confusion
w/ row "keys."]
This is primarily useful when your column set is not fixed. Cassandra
can currently handle up to a million or so columns without problems,
and with a little work could handle billions. So treating a row as an
associative array with dynamic column names that are determined at
runtime is a totally legitimate thing to do. So if you are storing
"objects" like address data, a supercolumn maps more closely to what
you would think of in an OO language as
Map<String, Address> addresses
rather than having to treat each field separately:
Map<String, String> streets
Map<String, String> cities
Map<String, String> zip
Besides being a more natural fit for the data, your row-level index of
column names is much more effective when related data is grouped like
this, than when you repeat the name N times for N fields.
> 2) Can SuperColumns only add one level of nesting beyond normal
> columns? That seems limiting considerng BigTable and HBase can append
> an arbitrary number of nested keys together.
Yes, only one level of nesting.
Remember, column names are just a byte[]. You can still smush column
names together if you want to. You don't need my permission. :)
(Although needing more than one level of nesting is often a sign you
should rethink your row model.)
> 3) Can you update the columns in the row of a supercolumn without
> overwriting the whole row?
Yes.
> Similarly,
> can you read a fraction of a SuperColumn without pulling the whole
> thing to the client?
Yes.
-Jonathan
|