incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Standefer <>
Subject Re: [DISCUSSION] High-volume counters in Cassandra
Date Thu, 02 Sep 2010 19:28:15 GMT
At SimpleGeo, we're close to just merging 1072 internally.  I've talked with
several members of the community who have already done this and are running
1072 in production or quasi-production.  It seems like if this isn't merged,
people are going to merge it internally anyways.  I think such a widely
desired feature should not be left as a patch for users to merge themselves.
 I like the idea of including both approaches and choosing between them
given your requirements.

-Ben Standefer

On Thu, Sep 2, 2010 at 12:01 PM, Johan Oskarsson <> wrote:

> In the last few months Digg and Twitter have been using a counter patch
> that lets Cassandra act as a high-volume realtime counting system. Atomic
> counters enable new applications that were previously difficult to implement
> at scale, including realtime analytics and large-scale systems monitoring.
> Discussion
> There are currently two different suggestions for how to implement counters
> in Cassandra. The discussion has so far been limited to those following the
> jiras (CASSANDRA-1072 and CASSANDRA-1421) closely and we don’t seem to be
> nearing a decision. I want to open it up to the Cassandra community at large
> to get additional feedback.
> Below are very basic and brief introductions to the alternatives. Please
> help us move forward by reading through the docs and jiras and reply to this
> thread with your thoughts. Would one or the other, both or neither be
> suitable for inclusion in Cassandra? Is there a third option? What can we do
> to reach a decision?
> We believe that both options can coexist; their strengths and weaknesses
> make them suitable for different use cases.
> (see design doc)
> How does it work?
> A node is picked as the primary replica for each write. The context byte
> array for a column contains (primary replica ip, value). Any previous data
> with the same ip is reconciled with the new increment and put as the column
> value.
> Concerns raised
> * an increment in flight will be lost if the wrong node goes down
> * if an increment operation times out it’s impossible to know if it has
> been executed or not
> The most recent jira comment proposes a new API method for increments that
> reflects the different consistency level guarantees.
> How does it work?
> Each increment for a counter is stored as a (UUID, value) tuple. The read
> operations will read all these increment tuples for a counter, reconcile and
> return. On a regular interval the values are all read and reconciled into
> one value to reduce the amount of data required for each read operation.
> Concerns raised
> * poor read performance, especially for time-series data
> * post aggregation reconciliation issues
> Again, we feel that both options can co-exist, especially if the 1072 patch
> uses a new API method that reflects its different consistency level
> guarantees. Our proposal is to accept 1072 into trunk with the new API
> method, and when an implementation of 1421 is completed it can be accepted
> alongside.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message