incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@yakaz.com>
Subject Re: [DISCUSSION] High-volume counters in Cassandra
Date Mon, 27 Sep 2010 10:25:10 GMT
In CASSANDRA-1546, I propose an alternative to #1072. At it's core,
it rewrites #1072 without the clocks structure (by splitting the clock into
individual columns, not unlike what Zhu Han proposed in his preceding
mail, but in a row instead of a super column, for reason explained in the
issue).

But it is also my belief that it improves on the actual patch of #1072 in
the following ways:
  - it supports increments and decrements
  - it supports the usual consistency levels
  - it proposes an (optional) solution to the idempotency problem of
    increments (it's optional because it has a (fairly slight) performance cost
    that some may want to remove if they understand the risk).

When I say, I propose, I mean that I did wrote the patch (attached to the jira
ticket). I've just written it, so it is really under-tested and have a
few details here
and there to fix, but it should already be fairly functional (it
passes basic system
tests).

I welcome all comments on the patch. It has been written with in mind
the goal to
address most of the concerns that have been addressed on those counters since a
few months (both in terms of performance and implementation). It is my
belief that
is reaches this goal, hopefully other will agree.

--
Sylvain

On Mon, Sep 27, 2010 at 5:32 AM, Zhu Han <schumi.han@gmail.com> wrote:
>  I propose a new way to solve the counter problem in cassandra-1502[1].
> Since I do not follow the jira update very carefully, I paste it here and
> want to let more people comment it and then to see whether its feasible.
>
> "Seems like we have not found a solution acceptable to everybody. I tries to
> propose a new approach. Let's see whether anybody can shed some light on it
> and make it as reality.
>
> 1) We add a basic data structure, called as counter, which is a special type
> of super column.
>
> 2) The name of each column in the counter super column, is the host name of
> a cassandra node. And the value is the calculated result from that node.
>
> 3) WRITE PATH: Once a node receives the add/dec request of a counter, it
> de-serializes its local counter super column, and update the column named by
> itself atomically. After that, it propagates the updated column value to
> other replicas, just like how the mutation of a normal column is propagated
> to other replicas. Different consistency levels can be supported as before.
>
> 4) READ PATH: Depends on the consistency level, contact several replicas,
> read back the counter super column as whole, and get the latest counter
> value by summing up all columns in the counter. Read-repair logic can work
> as before.
>
> IMHO, the biggest advantages of this approach, is re-using as many
> mechanisms already in the code as possible. So it might not so disruptive.
> But adding new thrift API is inevitable. "
> NB: If it's feasible, I might not be the right man working on it as I have
> not touched the internal of cassandra for more than 1 year. I wants to
> contribute something to help us get consensus.
>
> [1]
> https://issues.apache.org/jira/browse/CASSANDRA-1502?focusedCommentId=12915103&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12915103
>
> best regards,
> hanzhu
>
>
> On Sun, Sep 26, 2010 at 9:49 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>
>> you have misunderstood.  if we continue the 1072 approach of writing
>> counter data to the clock field, this is necessarily incompatible with
>> the right way of writing counter data to the value field.  it's no
>> longer simply a matter of reversing 1070.
>>
>> On Sat, Sep 25, 2010 at 11:50 PM, Zhu Han <schumi.han@gmail.com> wrote:
>> > Jonathan,
>> >
>> > This is a personnel email.
>> >
>> > On Sun, Sep 26, 2010 at 1:27 PM, Jonathan Ellis <jbellis@gmail.com>
>> wrote:
>> >>
>> >> On Sat, Sep 25, 2010 at 8:57 PM, Zhu Han <schumi.han@gmail.com> wrote:
>> >> > Can we just let the patch committed but mark it as "alpah" or
>> >> > "experimental"?
>> >>
>> >> I explained exactly why that is not a good approach here:
>> >> http://www.mail-archive.com/dev@cassandra.apache.org/msg00917.html
>> >>
>> > Yes, I see. But the clock structure is in truck since Cassandra-1070.  We
>> > still need to clean them
>> > out,  whatever. We need somebody to be volunteer to take this work.
>> > Considering the complexity
>> > of Cassandra-1070, the programmer who has the in depth knowledge of this
>> > patch is preferable. And it
>> > will take some time to do it.
>> >
>> > Fortunately,  Johan Oskarsson has promised to take it in the comment of
>> > Cassandra-1072[1]:
>> >
>> > "The clock changes would get into trunk quicker if we didn't, avoiding
>> the
>> > extra overhead of a big patch during reviews, merge with trunk, code
>> updates
>> > and publication of a new patch.
>> > If the concern is that we won't attend to the clocks once this patch is
>> in I
>> > can promise that we'll look at it straight away. "
>> >
>> > And if twitter/digg/simplegeo forks their tree of cassandra, this will
>> give
>> > a big marketing opportunities of other NOSQL system supporters. As you
>> know,
>> > the competition is quite fierce currently.
>> >
>> > So, instead of sticking to the embarrassed situation,  why not change to
>> > another strategy:
>> >
>> >> "Fork another experimental tree from 0.7 beta 1 and accept
>> >> Cassandra-1072.  At the same time, start the clean up work on this tree.
>> >> Once it's finalized , merge them back to 0.7, no matter it's 0.7.1 or
>> 0.7.2.
>> >>
>> >> Hence, these guys from twitter does not need to maintain a huge
>> >> out-of-tree patch, while the quality impact of cassandra-1072 is still
>> >> limited.
>> >
>> > I do know the pain of maintaining a large patch out of the official tree.
>> > Once it gets in, everybody will feels much better.
>> >
>> > If you give some opportunities to this patch, Johan or others  can be
>> highly
>> > motivated because all of the community works together.  It's a
>> compromise,
>> > but it's worth.
>> >
>> > [1]
>> >
>> https://issues.apache.org/jira/browse/CASSANDRA-1072?focusedCommentId=12909234&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12909234
>> >
>> >
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of Riptano, the source for professional Cassandra support
>> >> http://riptano.com
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>

Mime
View raw message