incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <>
Subject Re: [DISCUSSION] High-volume counters in Cassandra
Date Tue, 28 Sep 2010 17:12:50 GMT
Is there any feedback from Twitter and Digg and perhaps SimpleGeo people about CASSANDRA-1546?
 Would that work so that you wouldn't have to maintain a fork?

On Sep 27, 2010, at 5:25 AM, Sylvain Lebresne wrote:

> In CASSANDRA-1546, I propose an alternative to #1072. At it's core,
> it rewrites #1072 without the clocks structure (by splitting the clock into
> individual columns, not unlike what Zhu Han proposed in his preceding
> mail, but in a row instead of a super column, for reason explained in the
> issue).
> But it is also my belief that it improves on the actual patch of #1072 in
> the following ways:
>  - it supports increments and decrements
>  - it supports the usual consistency levels
>  - it proposes an (optional) solution to the idempotency problem of
>    increments (it's optional because it has a (fairly slight) performance cost
>    that some may want to remove if they understand the risk).
> When I say, I propose, I mean that I did wrote the patch (attached to the jira
> ticket). I've just written it, so it is really under-tested and have a
> few details here
> and there to fix, but it should already be fairly functional (it
> passes basic system
> tests).
> I welcome all comments on the patch. It has been written with in mind
> the goal to
> address most of the concerns that have been addressed on those counters since a
> few months (both in terms of performance and implementation). It is my
> belief that
> is reaches this goal, hopefully other will agree.
> --
> Sylvain
> On Mon, Sep 27, 2010 at 5:32 AM, Zhu Han <> wrote:
>>  I propose a new way to solve the counter problem in cassandra-1502[1].
>> Since I do not follow the jira update very carefully, I paste it here and
>> want to let more people comment it and then to see whether its feasible.
>> "Seems like we have not found a solution acceptable to everybody. I tries to
>> propose a new approach. Let's see whether anybody can shed some light on it
>> and make it as reality.
>> 1) We add a basic data structure, called as counter, which is a special type
>> of super column.
>> 2) The name of each column in the counter super column, is the host name of
>> a cassandra node. And the value is the calculated result from that node.
>> 3) WRITE PATH: Once a node receives the add/dec request of a counter, it
>> de-serializes its local counter super column, and update the column named by
>> itself atomically. After that, it propagates the updated column value to
>> other replicas, just like how the mutation of a normal column is propagated
>> to other replicas. Different consistency levels can be supported as before.
>> 4) READ PATH: Depends on the consistency level, contact several replicas,
>> read back the counter super column as whole, and get the latest counter
>> value by summing up all columns in the counter. Read-repair logic can work
>> as before.
>> IMHO, the biggest advantages of this approach, is re-using as many
>> mechanisms already in the code as possible. So it might not so disruptive.
>> But adding new thrift API is inevitable. "
>> NB: If it's feasible, I might not be the right man working on it as I have
>> not touched the internal of cassandra for more than 1 year. I wants to
>> contribute something to help us get consensus.
>> [1]
>> best regards,
>> hanzhu
>> On Sun, Sep 26, 2010 at 9:49 PM, Jonathan Ellis <> wrote:
>>> you have misunderstood.  if we continue the 1072 approach of writing
>>> counter data to the clock field, this is necessarily incompatible with
>>> the right way of writing counter data to the value field.  it's no
>>> longer simply a matter of reversing 1070.
>>> On Sat, Sep 25, 2010 at 11:50 PM, Zhu Han <> wrote:
>>>> Jonathan,
>>>> This is a personnel email.
>>>> On Sun, Sep 26, 2010 at 1:27 PM, Jonathan Ellis <>
>>> wrote:
>>>>> On Sat, Sep 25, 2010 at 8:57 PM, Zhu Han <>
>>>>>> Can we just let the patch committed but mark it as "alpah" or
>>>>>> "experimental"?
>>>>> I explained exactly why that is not a good approach here:
>>>> Yes, I see. But the clock structure is in truck since Cassandra-1070.  We
>>>> still need to clean them
>>>> out,  whatever. We need somebody to be volunteer to take this work.
>>>> Considering the complexity
>>>> of Cassandra-1070, the programmer who has the in depth knowledge of this
>>>> patch is preferable. And it
>>>> will take some time to do it.
>>>> Fortunately,  Johan Oskarsson has promised to take it in the comment of
>>>> Cassandra-1072[1]:
>>>> "The clock changes would get into trunk quicker if we didn't, avoiding
>>> the
>>>> extra overhead of a big patch during reviews, merge with trunk, code
>>> updates
>>>> and publication of a new patch.
>>>> If the concern is that we won't attend to the clocks once this patch is
>>> in I
>>>> can promise that we'll look at it straight away. "
>>>> And if twitter/digg/simplegeo forks their tree of cassandra, this will
>>> give
>>>> a big marketing opportunities of other NOSQL system supporters. As you
>>> know,
>>>> the competition is quite fierce currently.
>>>> So, instead of sticking to the embarrassed situation,  why not change to
>>>> another strategy:
>>>>> "Fork another experimental tree from 0.7 beta 1 and accept
>>>>> Cassandra-1072.  At the same time, start the clean up work on this tree.
>>>>> Once it's finalized , merge them back to 0.7, no matter it's 0.7.1 or
>>> 0.7.2.
>>>>> Hence, these guys from twitter does not need to maintain a huge
>>>>> out-of-tree patch, while the quality impact of cassandra-1072 is still
>>>>> limited.
>>>> I do know the pain of maintaining a large patch out of the official tree.
>>>> Once it gets in, everybody will feels much better.
>>>> If you give some opportunities to this patch, Johan or others  can be
>>> highly
>>>> motivated because all of the community works together.  It's a
>>> compromise,
>>>> but it's worth.
>>>> [1]
>>>>> --
>>>>> Jonathan Ellis
>>>>> Project Chair, Apache Cassandra
>>>>> co-founder of Riptano, the source for professional Cassandra support
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of Riptano, the source for professional Cassandra support

View raw message