Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of jshook@gmail.com designates
 209.85.212.44 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=kMRq1a3E1UdGNOiXnZJJ+4FlrF59wYE3+2RvEMXF+u5qQzE3GqBvtJcfC2E+kHsKZe
         A2ivL6FMK438cHtWvFre7a5+aYd0LGUHXlUlbwkE0LElFSkeg5pb80udtgBMwinQEgIm
         WpbGHC6VXlW0AQYsdlsFLW/PSrWuaI2bBYGGo=
MIME-Version: 1.0
In-Reply-To: <AANLkTikY1G0Bo5-IahK1nCD6rfwXfqzZrESPqquCvLAx@mail.gmail.com>
References: <AANLkTiljbIQWTmGuByNS2SPDCxKtNuDvrmqKLeRUjQXM@mail.gmail.com>
	<AANLkTikKy6oVxL3Ek4GxS1oMD1_n6rchZYW1-jhJx_YU@mail.gmail.com>
	<AANLkTikY1G0Bo5-IahK1nCD6rfwXfqzZrESPqquCvLAx@mail.gmail.com>
Date: Sat, 5 Jun 2010 17:33:24 -0500
Message-ID: <AANLkTin77_YVp8qhPlOu3YY_S-_Mo31jp1v7AbwgTUTg@mail.gmail.com>
Subject: Re: Conditional get
From: Jonathan Shook <jshook@gmail.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

It sounds like you are getting a handle on it, but maybe in a round-about w=
ay.
Here are some ways I like of conceptualizing Cassandra. Maybe they can
shorten your walk.

Either the grid analogy or the maps-of-maps analogy can apply, as they
both map conceptually to the way that we use a column family.

--

The maps-of-maps analogy:
Please try to think of the "column" as the intersection between a row
key and a column name. This captures the most essential concepts.
It's easier for me to think of in terms of a sorted map to a sorted map, wh=
ere:
* the outer map is the set of rows whose whose (map) keys and (map)
values are (Cassandra) keys and (Cassandra) rows
* the inner map for each row key is the set of columns whose keys and
values are column names and column data.
* column data is essentially a molecule of (column name, column value,
storage timestamp). It can be thought of as the "value", but it is
stored as a 3-tuple.

--

The grid analogy: (This one is my favorite)
In the grid analogy, rows may be undefined. Rows that are defined may
have columns that are undefined.
Two things to think about when using this analogy:
Cassandra doesn't have to store undefined values, except during
deletes and before anti-entropy takes them away.
Cassandra operates behind the scenes in row-major order. That means
that while you can think of it terms of a Cartesian intersection, you
should know that rows will always be accessed first.

--=20

Another layer outward is the column family, which is also a map.

Another layer inward is the sub-column, which is also a map.
Don't get confused by super columns or sub columns. Super/Sub columns
are really API sugar to reduce some of the work of using your own
serialized aggregates within a normal column value. I find that the
confusion is usually not worth the trouble when starting out. On the
other hand, were you to implement your own aggregate types within a
column value, the purpose of super/sub columns would seem obvious.
It's just a little overly complex because of the supporting types in
the API. Since this was basically bolted on to the standard column
support, it falls into normal column behavior to the core Cassandra
machinery.

Neither the column family layer, nor the subcolumn layer have been
given the same attention as the basic row->column with respect to
performance and scalability.
This may change in the future. For now, consider that only row-keys
and column-names are places where Cassandra is able to scale the best.

Jonathan


On Sat, Jun 5, 2010 at 4:06 PM, Peter Schuller
<peter.schuller@infidyne.com> wrote:
>> Eric wrote a good explanation with sample code at
>> http://www.rackspacecloud.com/blog/2010/05/12/cassandra-by-example/
>
> Regarding the schema description and analogy problem mentioned in the
> article; I found that reading the BigTable paper helped a lot for me.
> It seemed very useful to me to think of a ColumnFamily in Cassandra as
> a sorted (on keys) on-disk table of entries with efficiency guarantees
> with respect to range queries and locality on disk.
>
> Please correct me if I am wrong, but the data model as I now
> understand it essentially boils down to a sorted table of the form
> (readers who don't know the answer, please don't assume I'm right
> unless someone in the know confirms it; I don't want to add to the
> confusion):
>
> =A0rowkeyN+0,columnM+0 data
> =A0rowkeyN+0,columnM+1 data
> =A0...
> =A0rowkeyN+1 data
> =A0rowkeyN+2 data
> =A0...
>
> Where each piece of "data" is is the column (I am ignoring super
> columns for now).
>
> The table, other than being sorted, is indexed on row key and column name=
.
>
> Is this correct?
>
> In my head I think of it as there being some N amount of "keys" (not
> the cassandra term) that are interesting to the application, which end
> up mapping to the actual "key" (not the cassandra term) in the table.
> So, in a column family "users", we might have a "john doe" whose age
> is "47". This means we have a "key" (not the cassandra term) which is
> "users,john doe,age" and whose value is "47" (ignoring time stamps and
> ignoring keys that contain commas, and ignoring column names being
> semantically part of the data).
>
> So, given:
>
> =A0 =A0 =A0 users,john doe,age
>
> We have, in cassandra terms:
>
> =A0column family: users
> =A0key: john doe
> =A0column name: age
>
> The fact that different column families are in different files, to me,
> seems mostly to be an implementation details since performance
> characteristics (sorting, locality on disk) should be the same as it
> had been if it was just one huge table (ignoring compactation
> concerns, etc).
>
> The API exposed by cassandra is not one of a generalized multi-level
> key, but rather one with specific concepts of ColumnFamily, Column and
> SuperColumn. These essentially provides a two-level key (in the case
> of a CF with C:s) and a three-level key (in the case of a CF with SC:s
> with C:s), with the caveat that three-level keys are still only
> indexed on their first two components (even though they are still
> sorted on disk).
>
> Does this make sense at all? Provided that I have not misunderstood
> the model completely and am completely wrong, I find this a much more
> natural way to think of the underlying storage semantics.
>
> --
> / Peter Schuller
>