Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
Message-ID: <4C3FDF48.2090101@fourkitchens.com>
Date: Fri, 16 Jul 2010 04:25:44 +0000
From: David Strauss <david@fourkitchens.com>
Organization: Four Kitchens
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US;
 rv:1.9.1.10) Gecko/20100512 Thunderbird/3.0.5
MIME-Version: 1.0
To: user@cassandra.apache.org
Subject: Re: A very short summary on Cassandra for a book
References: <AANLkTimlIUMTGGlJgcHwiqKv0DWuXlsd8FJ69iZAfJAv@mail.gmail.com>
 <AANLkTin2BHoJBumu4s3-1A79hcGAyGEeVObESzegZmee@mail.gmail.com>
In-Reply-To: <AANLkTin2BHoJBumu4s3-1A79hcGAyGEeVObESzegZmee@mail.gmail.com>
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature";
 boundary="------------enig759C70C892FAF2E663467E90"

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig759C70C892FAF2E663467E90
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 2010-07-16 01:57, Dave Viner wrote:
> I am no expert... but parts seem accurate, parts not.
>=20
> "Cassandra stores four or five dimension associated arrays"
> not sure what you're counting as a dimension of the associated array,
> but here are the 2 associative array-like syntaxes:
>=20
> ColumnFamily[row-key][column-name] =3D value1
> ColumnFamily[row-key][super-column-name][column-name] =3D value2

You're forgetting the first dimension: the keyspace. However, that
dimension is mostly a scope for configuration and administration, just
like MySQL "databases" on a single MySQL instance.

> "The first dimension is fixed on creation of the database but the
> rest can be infinitely large"
> I don't understand this sentence.  The definition of a ColumnFamily is
> set by the configuration file (storage-conf.xml).  If you change it, an=
d
> restart a node, that node will use the new definition of the CF.

For a book, I would avoid pinning down what's dynamic at runtime and
what's fixed at startup because that's changing rapidly with upcoming
versions. Cassandra 0.7 features dynamic keyspace and column family
creation, and its release is going to happen well before the end of 2010.=


Even now, it's possible to modify most configurations with no disruption
via a rolling cluster restart.

> It is true that the number of columns can be large.  I have no idea if
> it's actually infinite - but more or less.

There is no hard cap on the number of columns in a row. Real-world
systems are known to comfortably scale to millions of columns per row.

In current Cassandra releases, however, each super-column must fit into
memory. This is because the current architecture treats super-columns
and columns very similarly. While it's planned to change this for future
releases, there's interest in a broader overhaul allowing arbitrary
dimensionality; I wouldn't count on any change soon.

Also -- and this isn't much of a restriction -- each row must fit on a
single node's disk.

> Also, it's probably not precise to call it a database, since that tends=

> to invoke images of things like MySQL, Oracle, Postgres, etc. =20

Those are *relational* databases. Historically, "database" has been a
general term for persistent data stores.

> "Inserts are super fast and can happen to any
> database server in the cluster."
> Yes, this is true.

Not 100% true. The sharding/partitioning mechanism in Cassandra assigns
each row to at least one server in the cluster (more if the replication
level is higher than one). It's possible to "write" to any server in the
cluster, but the write will only complete once confirmed on an
appropriate number of nodes (based on ConsistencyLevel).

ConsistencyLevel.ZERO is a special exception that allows nearly blind
writes to any node in the cluster, asynchronously replicating the data
to the proper nodes, but most applications use at least
ConsistencyLevel.ONE for any serious writes.

The replication topology also affects write latency. Using a RackAware
approach, Cassandra will often require a confirmed write at a remote
location.

Cassandra intentionally allows applications to dynamically decide read
and write latency tradeoffs against consistency guarantees. So, I'd say
writes in Cassandra are "as fast as your consistency and durability
requirements allow."

> "However, the system is append only there so there is no in-place updat=
e
> operation like increment"
> The first part is not quite true.  There is appending, but there is no
> increment that's guaranteed universal.  Cassandra is "eventually
> consistent".  So atomic increment doesn't really work in the "eventual"=

> world.  But, more precisely, one can add, update, change, modify, delet=
e
> rows, columns, and values at any time from any node.

The lack of increment support has little to do with eventual consistency
and everything to do with timestamp-based conflict resolution. With
vector clocks (likely landing in 0.7 as a result of Digg's work), it
will be possible to support increment and decrement operations, just not
ones that give you an instant, unique result. The actual inc and dec
support probably won't be in 0.7, though.

> "Also sorting happens on insert time"
> Yes, I believe this is true.

Basically true. I could nitpick, but it wouldn't add much clarity to the
discussion.

--=20
David Strauss
   | david@fourkitchens.com
   | +1 512 577 5827 [mobile]
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]


--------------enig759C70C892FAF2E663467E90
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.10 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkw/300ACgkQZ52GCE5ilTOK+ACeLrS2nxfZGpg30CGtndkvVDng
GGkAn32FN66CkNAw7T2fiJRqkyLlyQRm
=BxTf
-----END PGP SIGNATURE-----

--------------enig759C70C892FAF2E663467E90--