cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin>
Subject Re: cassandra increment counters, Jira #1072
Date Fri, 13 Aug 2010 15:49:17 GMT
On Fri, Aug 13, 2010 at 6:24 AM, Jonathan Ellis <> wrote:
>> This is simply not an acceptable alternative and just can't be called
>> handling it "well".
> What part is it handling poorly, at a technical level?  This is almost
> exactly what 1072 does internally -- we are concerned here with the
> high write, low read volume case.

Requiring clients directly manage the counter rows in order to
periodically compress or segment them.  Yes, you can emulate the
behavior.  No, that is not handling it well.

>>  It is equivalent to "make the users do it", which
>> is the case for almost anything.
> I strongly feel we should be in the business of providing building
> blocks, not special cases on top of that.  (But see below, I *do*
> think the 580 version vectors is the kind of building block we want!)

I agree, 580 is really valuable and should be in.  The problem for
high write rate, distributed counters is the requirement of read
before write inherent in such vector-based approaches.  Am I missing
some aspect of 580 that precludes that?

>>  The reasons #1072 is so valuable:
>> 1) Does not require _any_ user action.
> This can be addressed at the library level.  Just as our first stab at
> ZK integration was a rather clunky patch; "cages" is better.

Certainly, but it would be hard to argue (and I am not) that the
tightly synchronized behavior of ZK is a good match for Cassandra
(mixing in Paxos could make for some neat options, but that's another

>> 2) Does not change the EC-centric model of Cassandra.
> It does, though.  1072 is *not* a version vector-based approach --
> that would be 580.  Read the 1072 design doc, if you haven't.  (Thanks
> to Kelvin for writing that up!)

Nor is Cassandra right now.  I know 1072 isn't vector based, and I
think that is in its favor _for this application_.

> I'm referring in particular to reads requiring CL.ALL.  (My
> understanding is that in the previous design, a "master" replica was
> chosen and was always written to first.)  Both of these break "the
> EC-centric model" and that is precisely the objection I made when I
> said "ConsistencyLevel is not respected."  I don't think this is
> fixable in the 1072 approach.  I would be thrilled to be wrong.

It is EC in that the total for a counter is unknown until resolved on
read.  Yes, it does not respect CL, but since it can only be used in 1
way, I don't see that as a disadvantage.

>>> The second is that the approach in 1072 resembles an entirely separate
>>> system that happens to use part of Cassandra infrastructure -- the
>>> thrift API, the MessagingService, the sstable format -- but isn't
>>> really part of it.  ConsistencyLevel is not respected, and special
>>> cases abound to weld things in that don't fit, e.g. the AES/Streaming
>>> business.
>> Then let's find ways to make it as elegant as it can be.  Ultimately,
>> this functionality needs to be in Cassandra or users will simply
>> migrate someplace else for this extremely common use case.
> This is what I've been pushing for.  The version vector approach to
> counting (i.e. 580 as opposed to 1072) is exactly the more elegant,
> EC-centric approach that addresses a case that we *don't* currently
> handle well (counters with a higher read volume than 1072).

Perhaps I missed something: does counting 580 require read before
counter update (local to the node, not a client read)?


View raw message