incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhu Han <>
Subject Re: [DISCUSSION] High-volume counters in Cassandra
Date Sun, 05 Sep 2010 10:01:10 GMT
I thought about it again for a while.  It might be a good trade-off to just
implement the "CASSANDRA-1421"  as a new API and limit the new code only in
StorageProxy level, and never put any dependency on internal memchanism of
Cassandra, e.g. compaction,  membership management and other complicated

If it does not pollute the core code base, it may be easier to refine it or
remove them when there is a better idea.

If so, the  number of writers of a single counter is at most equals to the
number of cassandra node.

If the client takes a simple optimism to raise the thrift request only to
the cassandra node which are the storage node of the counter, the
performance should be almost the same as " CASSANDRA-1072 + CASSANDRA-1397",
as the number of writers is the same as the number of replications. You can
also save an extra round trip time. That's what I did in my project.

best regards,

On Sun, Sep 5, 2010 at 5:24 PM, Zhu Han <> wrote:

> + 1 for Jonathan Ellis.
> I might not be on the same page as you active community members. But I'm
> wondering why not put this feature to a popular client library or as a
> contrib package?
> In CASSANDRA-1072 + CASSANDRA-1397, the increment of counter is not
> idempotent, so it's difficult to align with the consistency model of
> Cassandra.  It's not worth to put a lot of code to the core base to just
> serve a single feature.
> In CASSANDRA-1421, the increment is idempotent and is easier to align with
> Cassandra. However, the read performance could be poor because it has to
> reconcile a lot of columns. The memory consumption on cassandra node might
> be much higher than the above approach, if I understood it correctly.
> If you decides to put the feature to the client library. The client library
> can take the approach as CASSANDRA-142, and serialize the increment from a
> single writer to limit the columns generated.  If the writers of a single
> counter are just hundreds processes, I don't think it is a big deal for
> performance.
> If you worry about the performance on the client side because it serialize
> the increment of a single counter,  maintain a queue for each counter and
> it's easy to batch multiple updates in the same queue.
> best regards,
> hanzhu
> On Fri, Sep 3, 2010 at 4:55 AM, Jonathan Ellis <> wrote:
>> I still have not seen any response to my other misgivings about 1072
>> that I have raised on the ticket.  Specifically, the existing patch is
>> based around a Clock structure that, since 580 is a dead end, is no
>> longer necessary.
>> I'm also uneasy about adding 200k of code that meshes as poorly with
>> the rest of Cassandra as this does.  The more it can be split off into
>> separate code paths, the better.  Adding its own thrift method is a
>> good start, but it should go deeper than that.
>> On Thu, Sep 2, 2010 at 12:01 PM, Johan Oskarsson <>
>> wrote:
>> > In the last few months Digg and Twitter have been using a counter patch
>> that lets Cassandra act as a high-volume realtime counting system. Atomic
>> counters enable new applications that were previously difficult to implement
>> at scale, including realtime analytics and large-scale systems monitoring.
>> >
>> > Discussion
>> > There are currently two different suggestions for how to implement
>> counters in Cassandra. The discussion has so far been limited to those
>> following the jiras (CASSANDRA-1072 and CASSANDRA-1421) closely and we don’t
>> seem to be nearing a decision. I want to open it up to the Cassandra
>> community at large to get additional feedback.
>> >
>> > Below are very basic and brief introductions to the alternatives. Please
>> help us move forward by reading through the docs and jiras and reply to this
>> thread with your thoughts. Would one or the other, both or neither be
>> suitable for inclusion in Cassandra? Is there a third option? What can we do
>> to reach a decision?
>> >
>> > We believe that both options can coexist; their strengths and weaknesses
>> make them suitable for different use cases.
>> >
>> >
>> > CASSANDRA-1072 + CASSANDRA-1397
>> > (see design doc)
>> >
>> >
>> > How does it work?
>> > A node is picked as the primary replica for each write. The context byte
>> array for a column contains (primary replica ip, value). Any previous data
>> with the same ip is reconciled with the new increment and put as the column
>> value.
>> >
>> > Concerns raised
>> > * an increment in flight will be lost if the wrong node goes down
>> > * if an increment operation times out it’s impossible to know if it has
>> been executed or not
>> >
>> > The most recent jira comment proposes a new API method for increments
>> that reflects the different consistency level guarantees.
>> >
>> >
>> > CASSANDRA-1421
>> >
>> >
>> > How does it work?
>> > Each increment for a counter is stored as a (UUID, value) tuple. The
>> read operations will read all these increment tuples for a counter,
>> reconcile and return. On a regular interval the values are all read and
>> reconciled into one value to reduce the amount of data required for each
>> read operation.
>> >
>> > Concerns raised
>> > * poor read performance, especially for time-series data
>> > * post aggregation reconciliation issues
>> >
>> >
>> > Again, we feel that both options can co-exist, especially if the 1072
>> patch uses a new API method that reflects its different consistency level
>> guarantees. Our proposal is to accept 1072 into trunk with the new API
>> method, and when an implementation of 1421 is completed it can be accepted
>> alongside.
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message