From Joe Stump <>
Subject Re: Digg's data model
Date Sat, 20 Mar 2010 15:24:02 GMT

On Mar 20, 2010, at 2:53 AM, Lenin Gali wrote:

> 1. Eventual consistency: Given a volume of 5K writes / sec and roughly 1500 writes are
Updates per sec while the rest are inserts, what kind of latency can be expected in eventual

Depending on the size of the cluster you're not looking at much latency at all. On the order
of 10's of ms's.

> 2. Performance: Are there any bench marks on how many writes /sec and reads/sec cassandra
supports on an "n node" cluster? a Node can be of variable size and would like to know the
hardware/software details of the cluster as well. 

Cassandra's performance is impressive. We had one node spike at 103,000 reads a second with
a load of only about 6, which is high, but not alarmingly so.

> 3. EC2: Has any one implemented cassandra on EC2 and what kind transaction volume are
they using it for and how is their experience with cassandra on EC2?.

We have a 15 node cluster on EC2. We have a patch that is a rack aware strategy specifically
for EC2 zones where it replicates keys in a manner so that you have one key in each AZ. We
run Cassandra across 3 AZ's on large instances with the ephemeral drives in a RAID0 setup
with XFS.

You might also be interested in this:

> 4. Overhead and issues: What are typical nightmare scenario's one could face when using
Cassandra for heavy write / read intensive systems?

We haven't ran into any, but when we do find hot spots in the cluster we bootstrap a new node
into the cluster with a token range that will alleviate the hot spot. This is rather painless
in our experiences. 

> 5. Backups : If there is a  4 or 5 TB cassandra cluster what do you recommend the backup
scenario's could be?

There isn't one that I know of. This is what the replication factor is for. We keep three
copies of each key in three different datacenters. That's our backup strategy. 

> Also, Does cassandra support counters?

Not yet, but there's active work happening in this area.

> Digg's article said they are going to contribute their work to open source any idea when
that would be?

Chris Goffinet is a committer to Cassandra. Digg's contributions are contributed back in an
almost daily fashion.

