cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johan Oskarsson <>
Subject [DISCUSSION] High-volume counters in Cassandra
Date Thu, 02 Sep 2010 19:01:20 GMT
In the last few months Digg and Twitter have been using a counter patch that lets Cassandra
act as a high-volume realtime counting system. Atomic counters enable new applications that
were previously difficult to implement at scale, including realtime analytics and large-scale
systems monitoring.

There are currently two different suggestions for how to implement counters in Cassandra.
The discussion has so far been limited to those following the jiras (CASSANDRA-1072 and CASSANDRA-1421)
closely and we don’t seem to be nearing a decision. I want to open it up to the Cassandra
community at large to get additional feedback.

Below are very basic and brief introductions to the alternatives. Please help us move forward
by reading through the docs and jiras and reply to this thread with your thoughts. Would one
or the other, both or neither be suitable for inclusion in Cassandra? Is there a third option?
What can we do to reach a decision?

We believe that both options can coexist; their strengths and weaknesses make them suitable
for different use cases.

CASSANDRA-1072 + CASSANDRA-1397 (see design doc)

How does it work?
A node is picked as the primary replica for each write. The context byte array for a column
contains (primary replica ip, value). Any previous data with the same ip is reconciled with
the new increment and put as the column value.

Concerns raised
* an increment in flight will be lost if the wrong node goes down
* if an increment operation times out it’s impossible to know if it has been executed or

The most recent jira comment proposes a new API method for increments that reflects the different
consistency level guarantees.


How does it work?
Each increment for a counter is stored as a (UUID, value) tuple. The read operations will
read all these increment tuples for a counter, reconcile and return. On a regular interval
the values are all read and reconciled into one value to reduce the amount of data required
for each read operation. 

Concerns raised
* poor read performance, especially for time-series data
* post aggregation reconciliation issues

Again, we feel that both options can co-exist, especially if the 1072 patch uses a new API
method that reflects its different consistency level guarantees. Our proposal is to accept
1072 into trunk with the new API method, and when an implementation of 1421 is completed it
can be accepted alongside.
View raw message