incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <jeremy.hanna1...@gmail.com>
Subject Re: [DISCUSSION] High-volume counters in Cassandra
Date Tue, 28 Sep 2010 17:12:50 GMT
Is there any feedback from Twitter and Digg and perhaps SimpleGeo people about CASSANDRA-1546?
 Would that work so that you wouldn't have to maintain a fork?

On Sep 27, 2010, at 5:25 AM, Sylvain Lebresne wrote:

> In CASSANDRA-1546, I propose an alternative to #1072. At it's core,
> it rewrites #1072 without the clocks structure (by splitting the clock into
> individual columns, not unlike what Zhu Han proposed in his preceding
> mail, but in a row instead of a super column, for reason explained in the
> issue).
> 
> But it is also my belief that it improves on the actual patch of #1072 in
> the following ways:
>  - it supports increments and decrements
>  - it supports the usual consistency levels
>  - it proposes an (optional) solution to the idempotency problem of
>    increments (it's optional because it has a (fairly slight) performance cost
>    that some may want to remove if they understand the risk).
> 
> When I say, I propose, I mean that I did wrote the patch (attached to the jira
> ticket). I've just written it, so it is really under-tested and have a
> few details here
> and there to fix, but it should already be fairly functional (it
> passes basic system
> tests).
> 
> I welcome all comments on the patch. It has been written with in mind
> the goal to
> address most of the concerns that have been addressed on those counters since a
> few months (both in terms of performance and implementation). It is my
> belief that
> is reaches this goal, hopefully other will agree.
> 
> --
> Sylvain
> 
> On Mon, Sep 27, 2010 at 5:32 AM, Zhu Han <schumi.han@gmail.com> wrote:
>>  I propose a new way to solve the counter problem in cassandra-1502[1].
>> Since I do not follow the jira update very carefully, I paste it here and
>> want to let more people comment it and then to see whether its feasible.
>> 
>> "Seems like we have not found a solution acceptable to everybody. I tries to
>> propose a new approach. Let's see whether anybody can shed some light on it
>> and make it as reality.
>> 
>> 1) We add a basic data structure, called as counter, which is a special type
>> of super column.
>> 
>> 2) The name of each column in the counter super column, is the host name of
>> a cassandra node. And the value is the calculated result from that node.
>> 
>> 3) WRITE PATH: Once a node receives the add/dec request of a counter, it
>> de-serializes its local counter super column, and update the column named by
>> itself atomically. After that, it propagates the updated column value to
>> other replicas, just like how the mutation of a normal column is propagated
>> to other replicas. Different consistency levels can be supported as before.
>> 
>> 4) READ PATH: Depends on the consistency level, contact several replicas,
>> read back the counter super column as whole, and get the latest counter
>> value by summing up all columns in the counter. Read-repair logic can work
>> as before.
>> 
>> IMHO, the biggest advantages of this approach, is re-using as many
>> mechanisms already in the code as possible. So it might not so disruptive.
>> But adding new thrift API is inevitable. "
>> NB: If it's feasible, I might not be the right man working on it as I have
>> not touched the internal of cassandra for more than 1 year. I wants to
>> contribute something to help us get consensus.
>> 
>> [1]
>> https://issues.apache.org/jira/browse/CASSANDRA-1502?focusedCommentId=12915103&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12915103
>> 
>> best regards,
>> hanzhu
>> 
>> 
>> On Sun, Sep 26, 2010 at 9:49 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>> 
>>> you have misunderstood.  if we continue the 1072 approach of writing
>>> counter data to the clock field, this is necessarily incompatible with
>>> the right way of writing counter data to the value field.  it's no
>>> longer simply a matter of reversing 1070.
>>> 
>>> On Sat, Sep 25, 2010 at 11:50 PM, Zhu Han <schumi.han@gmail.com> wrote:
>>>> Jonathan,
>>>> 
>>>> This is a personnel email.
>>>> 
>>>> On Sun, Sep 26, 2010 at 1:27 PM, Jonathan Ellis <jbellis@gmail.com>
>>> wrote:
>>>>> 
>>>>> On Sat, Sep 25, 2010 at 8:57 PM, Zhu Han <schumi.han@gmail.com>
wrote:
>>>>>> Can we just let the patch committed but mark it as "alpah" or
>>>>>> "experimental"?
>>>>> 
>>>>> I explained exactly why that is not a good approach here:
>>>>> http://www.mail-archive.com/dev@cassandra.apache.org/msg00917.html
>>>>> 
>>>> Yes, I see. But the clock structure is in truck since Cassandra-1070.  We
>>>> still need to clean them
>>>> out,  whatever. We need somebody to be volunteer to take this work.
>>>> Considering the complexity
>>>> of Cassandra-1070, the programmer who has the in depth knowledge of this
>>>> patch is preferable. And it
>>>> will take some time to do it.
>>>> 
>>>> Fortunately,  Johan Oskarsson has promised to take it in the comment of
>>>> Cassandra-1072[1]:
>>>> 
>>>> "The clock changes would get into trunk quicker if we didn't, avoiding
>>> the
>>>> extra overhead of a big patch during reviews, merge with trunk, code
>>> updates
>>>> and publication of a new patch.
>>>> If the concern is that we won't attend to the clocks once this patch is
>>> in I
>>>> can promise that we'll look at it straight away. "
>>>> 
>>>> And if twitter/digg/simplegeo forks their tree of cassandra, this will
>>> give
>>>> a big marketing opportunities of other NOSQL system supporters. As you
>>> know,
>>>> the competition is quite fierce currently.
>>>> 
>>>> So, instead of sticking to the embarrassed situation,  why not change to
>>>> another strategy:
>>>> 
>>>>> "Fork another experimental tree from 0.7 beta 1 and accept
>>>>> Cassandra-1072.  At the same time, start the clean up work on this tree.
>>>>> Once it's finalized , merge them back to 0.7, no matter it's 0.7.1 or
>>> 0.7.2.
>>>>> 
>>>>> Hence, these guys from twitter does not need to maintain a huge
>>>>> out-of-tree patch, while the quality impact of cassandra-1072 is still
>>>>> limited.
>>>> 
>>>> I do know the pain of maintaining a large patch out of the official tree.
>>>> Once it gets in, everybody will feels much better.
>>>> 
>>>> If you give some opportunities to this patch, Johan or others  can be
>>> highly
>>>> motivated because all of the community works together.  It's a
>>> compromise,
>>>> but it's worth.
>>>> 
>>>> [1]
>>>> 
>>> https://issues.apache.org/jira/browse/CASSANDRA-1072?focusedCommentId=12909234&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12909234
>>>> 
>>>> 
>>>>> 
>>>>> --
>>>>> Jonathan Ellis
>>>>> Project Chair, Apache Cassandra
>>>>> co-founder of Riptano, the source for professional Cassandra support
>>>>> http://riptano.com
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of Riptano, the source for professional Cassandra support
>>> http://riptano.com
>>> 
>> 


Mime
View raw message