incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhu Han <>
Subject Re: [DISCUSSION] High-volume counters in Cassandra
Date Sun, 05 Sep 2010 09:24:37 GMT
+ 1 for Jonathan Ellis.

I might not be on the same page as you active community members. But I'm
wondering why not put this feature to a popular client library or as a
contrib package?

In CASSANDRA-1072 + CASSANDRA-1397, the increment of counter is not
idempotent, so it's difficult to align with the consistency model of
Cassandra.  It's not worth to put a lot of code to the core base to just
serve a single feature.

In CASSANDRA-1421, the increment is idempotent and is easier to align with
Cassandra. However, the read performance could be poor because it has to
reconcile a lot of columns. The memory consumption on cassandra node might
be much higher than the above approach, if I understood it correctly.

If you decides to put the feature to the client library. The client library
can take the approach as CASSANDRA-142, and serialize the increment from a
single writer to limit the columns generated.  If the writers of a single
counter are just hundreds processes, I don't think it is a big deal for

If you worry about the performance on the client side because it serialize
the increment of a single counter,  maintain a queue for each counter and
it's easy to batch multiple updates in the same queue.

best regards,

On Fri, Sep 3, 2010 at 4:55 AM, Jonathan Ellis <> wrote:

> I still have not seen any response to my other misgivings about 1072
> that I have raised on the ticket.  Specifically, the existing patch is
> based around a Clock structure that, since 580 is a dead end, is no
> longer necessary.
> I'm also uneasy about adding 200k of code that meshes as poorly with
> the rest of Cassandra as this does.  The more it can be split off into
> separate code paths, the better.  Adding its own thrift method is a
> good start, but it should go deeper than that.
> On Thu, Sep 2, 2010 at 12:01 PM, Johan Oskarsson <>
> wrote:
> > In the last few months Digg and Twitter have been using a counter patch
> that lets Cassandra act as a high-volume realtime counting system. Atomic
> counters enable new applications that were previously difficult to implement
> at scale, including realtime analytics and large-scale systems monitoring.
> >
> > Discussion
> > There are currently two different suggestions for how to implement
> counters in Cassandra. The discussion has so far been limited to those
> following the jiras (CASSANDRA-1072 and CASSANDRA-1421) closely and we don’t
> seem to be nearing a decision. I want to open it up to the Cassandra
> community at large to get additional feedback.
> >
> > Below are very basic and brief introductions to the alternatives. Please
> help us move forward by reading through the docs and jiras and reply to this
> thread with your thoughts. Would one or the other, both or neither be
> suitable for inclusion in Cassandra? Is there a third option? What can we do
> to reach a decision?
> >
> > We believe that both options can coexist; their strengths and weaknesses
> make them suitable for different use cases.
> >
> >
> > (see design doc)
> >
> >
> > How does it work?
> > A node is picked as the primary replica for each write. The context byte
> array for a column contains (primary replica ip, value). Any previous data
> with the same ip is reconciled with the new increment and put as the column
> value.
> >
> > Concerns raised
> > * an increment in flight will be lost if the wrong node goes down
> > * if an increment operation times out it’s impossible to know if it has
> been executed or not
> >
> > The most recent jira comment proposes a new API method for increments
> that reflects the different consistency level guarantees.
> >
> >
> > CASSANDRA-1421
> >
> >
> > How does it work?
> > Each increment for a counter is stored as a (UUID, value) tuple. The read
> operations will read all these increment tuples for a counter, reconcile and
> return. On a regular interval the values are all read and reconciled into
> one value to reduce the amount of data required for each read operation.
> >
> > Concerns raised
> > * poor read performance, especially for time-series data
> > * post aggregation reconciliation issues
> >
> >
> > Again, we feel that both options can co-exist, especially if the 1072
> patch uses a new API method that reflects its different consistency level
> guarantees. Our proposal is to accept 1072 into trunk with the new API
> method, and when an implementation of 1421 is completed it can be accepted
> alongside.
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message