incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Strauss <>
Subject Re: A very short summary on Cassandra for a book
Date Fri, 16 Jul 2010 04:25:44 GMT
On 2010-07-16 01:57, Dave Viner wrote:
> I am no expert... but parts seem accurate, parts not.
> "Cassandra stores four or five dimension associated arrays"
> not sure what you're counting as a dimension of the associated array,
> but here are the 2 associative array-like syntaxes:
> ColumnFamily[row-key][column-name] = value1
> ColumnFamily[row-key][super-column-name][column-name] = value2

You're forgetting the first dimension: the keyspace. However, that
dimension is mostly a scope for configuration and administration, just
like MySQL "databases" on a single MySQL instance.

> "The first dimension is fixed on creation of the database but the
> rest can be infinitely large"
> I don't understand this sentence.  The definition of a ColumnFamily is
> set by the configuration file (storage-conf.xml).  If you change it, and
> restart a node, that node will use the new definition of the CF.

For a book, I would avoid pinning down what's dynamic at runtime and
what's fixed at startup because that's changing rapidly with upcoming
versions. Cassandra 0.7 features dynamic keyspace and column family
creation, and its release is going to happen well before the end of 2010.

Even now, it's possible to modify most configurations with no disruption
via a rolling cluster restart.

> It is true that the number of columns can be large.  I have no idea if
> it's actually infinite - but more or less.

There is no hard cap on the number of columns in a row. Real-world
systems are known to comfortably scale to millions of columns per row.

In current Cassandra releases, however, each super-column must fit into
memory. This is because the current architecture treats super-columns
and columns very similarly. While it's planned to change this for future
releases, there's interest in a broader overhaul allowing arbitrary
dimensionality; I wouldn't count on any change soon.

Also -- and this isn't much of a restriction -- each row must fit on a
single node's disk.

> Also, it's probably not precise to call it a database, since that tends
> to invoke images of things like MySQL, Oracle, Postgres, etc.  

Those are *relational* databases. Historically, "database" has been a
general term for persistent data stores.

> "Inserts are super fast and can happen to any
> database server in the cluster."
> Yes, this is true.

Not 100% true. The sharding/partitioning mechanism in Cassandra assigns
each row to at least one server in the cluster (more if the replication
level is higher than one). It's possible to "write" to any server in the
cluster, but the write will only complete once confirmed on an
appropriate number of nodes (based on ConsistencyLevel).

ConsistencyLevel.ZERO is a special exception that allows nearly blind
writes to any node in the cluster, asynchronously replicating the data
to the proper nodes, but most applications use at least
ConsistencyLevel.ONE for any serious writes.

The replication topology also affects write latency. Using a RackAware
approach, Cassandra will often require a confirmed write at a remote

Cassandra intentionally allows applications to dynamically decide read
and write latency tradeoffs against consistency guarantees. So, I'd say
writes in Cassandra are "as fast as your consistency and durability
requirements allow."

> "However, the system is append only there so there is no in-place update
> operation like increment"
> The first part is not quite true.  There is appending, but there is no
> increment that's guaranteed universal.  Cassandra is "eventually
> consistent".  So atomic increment doesn't really work in the "eventual"
> world.  But, more precisely, one can add, update, change, modify, delete
> rows, columns, and values at any time from any node.

The lack of increment support has little to do with eventual consistency
and everything to do with timestamp-based conflict resolution. With
vector clocks (likely landing in 0.7 as a result of Digg's work), it
will be possible to support increment and decrement operations, just not
ones that give you an instant, unique result. The actual inc and dec
support probably won't be in 0.7, though.

> "Also sorting happens on insert time"
> Yes, I believe this is true.

Basically true. I could nitpick, but it wouldn't add much clarity to the

David Strauss
   | +1 512 577 5827 [mobile]
Four Kitchens
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]

View raw message