incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan King <>
Subject Re: [DISCUSSION] High-volume counters in Cassandra
Date Tue, 28 Sep 2010 19:25:06 GMT
Sorry, been catching up on this.

>From Twitter's perspective, 1546 is probably insufficient because it
doesn't allow one to do time-series data without supercolumns (which
might work ok, but require a good deal of work). Additionally, one of
our deployed systems already does supercolumns of counters, which is
not feasible in this design at all.


On Tue, Sep 28, 2010 at 10:12 AM, Jeremy Hanna
<> wrote:
> Is there any feedback from Twitter and Digg and perhaps SimpleGeo people about CASSANDRA-1546?
 Would that work so that you wouldn't have to maintain a fork?
> On Sep 27, 2010, at 5:25 AM, Sylvain Lebresne wrote:
>> In CASSANDRA-1546, I propose an alternative to #1072. At it's core,
>> it rewrites #1072 without the clocks structure (by splitting the clock into
>> individual columns, not unlike what Zhu Han proposed in his preceding
>> mail, but in a row instead of a super column, for reason explained in the
>> issue).
>> But it is also my belief that it improves on the actual patch of #1072 in
>> the following ways:
>>  - it supports increments and decrements
>>  - it supports the usual consistency levels
>>  - it proposes an (optional) solution to the idempotency problem of
>>    increments (it's optional because it has a (fairly slight) performance cost
>>    that some may want to remove if they understand the risk).
>> When I say, I propose, I mean that I did wrote the patch (attached to the jira
>> ticket). I've just written it, so it is really under-tested and have a
>> few details here
>> and there to fix, but it should already be fairly functional (it
>> passes basic system
>> tests).
>> I welcome all comments on the patch. It has been written with in mind
>> the goal to
>> address most of the concerns that have been addressed on those counters since a
>> few months (both in terms of performance and implementation). It is my
>> belief that
>> is reaches this goal, hopefully other will agree.
>> --
>> Sylvain
>> On Mon, Sep 27, 2010 at 5:32 AM, Zhu Han <> wrote:
>>>  I propose a new way to solve the counter problem in cassandra-1502[1].
>>> Since I do not follow the jira update very carefully, I paste it here and
>>> want to let more people comment it and then to see whether its feasible.
>>> "Seems like we have not found a solution acceptable to everybody. I tries to
>>> propose a new approach. Let's see whether anybody can shed some light on it
>>> and make it as reality.
>>> 1) We add a basic data structure, called as counter, which is a special type
>>> of super column.
>>> 2) The name of each column in the counter super column, is the host name of
>>> a cassandra node. And the value is the calculated result from that node.
>>> 3) WRITE PATH: Once a node receives the add/dec request of a counter, it
>>> de-serializes its local counter super column, and update the column named by
>>> itself atomically. After that, it propagates the updated column value to
>>> other replicas, just like how the mutation of a normal column is propagated
>>> to other replicas. Different consistency levels can be supported as before.
>>> 4) READ PATH: Depends on the consistency level, contact several replicas,
>>> read back the counter super column as whole, and get the latest counter
>>> value by summing up all columns in the counter. Read-repair logic can work
>>> as before.
>>> IMHO, the biggest advantages of this approach, is re-using as many
>>> mechanisms already in the code as possible. So it might not so disruptive.
>>> But adding new thrift API is inevitable. "
>>> NB: If it's feasible, I might not be the right man working on it as I have
>>> not touched the internal of cassandra for more than 1 year. I wants to
>>> contribute something to help us get consensus.
>>> [1]
>>> best regards,
>>> hanzhu
>>> On Sun, Sep 26, 2010 at 9:49 PM, Jonathan Ellis <> wrote:
>>>> you have misunderstood.  if we continue the 1072 approach of writing
>>>> counter data to the clock field, this is necessarily incompatible with
>>>> the right way of writing counter data to the value field.  it's no
>>>> longer simply a matter of reversing 1070.
>>>> On Sat, Sep 25, 2010 at 11:50 PM, Zhu Han <> wrote:
>>>>> Jonathan,
>>>>> This is a personnel email.
>>>>> On Sun, Sep 26, 2010 at 1:27 PM, Jonathan Ellis <>
>>>> wrote:
>>>>>> On Sat, Sep 25, 2010 at 8:57 PM, Zhu Han <>
>>>>>>> Can we just let the patch committed but mark it as "alpah" or
>>>>>>> "experimental"?
>>>>>> I explained exactly why that is not a good approach here:
>>>>> Yes, I see. But the clock structure is in truck since Cassandra-1070.
>>>>> still need to clean them
>>>>> out,  whatever. We need somebody to be volunteer to take this work.
>>>>> Considering the complexity
>>>>> of Cassandra-1070, the programmer who has the in depth knowledge of this
>>>>> patch is preferable. And it
>>>>> will take some time to do it.
>>>>> Fortunately,  Johan Oskarsson has promised to take it in the comment
>>>>> Cassandra-1072[1]:
>>>>> "The clock changes would get into trunk quicker if we didn't, avoiding
>>>> the
>>>>> extra overhead of a big patch during reviews, merge with trunk, code
>>>> updates
>>>>> and publication of a new patch.
>>>>> If the concern is that we won't attend to the clocks once this patch
>>>> in I
>>>>> can promise that we'll look at it straight away. "
>>>>> And if twitter/digg/simplegeo forks their tree of cassandra, this will
>>>> give
>>>>> a big marketing opportunities of other NOSQL system supporters. As you
>>>> know,
>>>>> the competition is quite fierce currently.
>>>>> So, instead of sticking to the embarrassed situation,  why not change
>>>>> another strategy:
>>>>>> "Fork another experimental tree from 0.7 beta 1 and accept
>>>>>> Cassandra-1072.  At the same time, start the clean up work on this
>>>>>> Once it's finalized , merge them back to 0.7, no matter it's 0.7.1
>>>> 0.7.2.
>>>>>> Hence, these guys from twitter does not need to maintain a huge
>>>>>> out-of-tree patch, while the quality impact of cassandra-1072 is
>>>>>> limited.
>>>>> I do know the pain of maintaining a large patch out of the official tree.
>>>>> Once it gets in, everybody will feels much better.
>>>>> If you give some opportunities to this patch, Johan or others  can be
>>>> highly
>>>>> motivated because all of the community works together.  It's a
>>>> compromise,
>>>>> but it's worth.
>>>>> [1]
>>>>>> --
>>>>>> Jonathan Ellis
>>>>>> Project Chair, Apache Cassandra
>>>>>> co-founder of Riptano, the source for professional Cassandra support
>>>> --
>>>> Jonathan Ellis
>>>> Project Chair, Apache Cassandra
>>>> co-founder of Riptano, the source for professional Cassandra support

View raw message