incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Black...@b3k.us>
Subject Re: cassandra increment counters, Jira #1072
Date Fri, 13 Aug 2010 15:49:17 GMT
On Fri, Aug 13, 2010 at 6:24 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> This is simply not an acceptable alternative and just can't be called
>> handling it "well".
>
> What part is it handling poorly, at a technical level?  This is almost
> exactly what 1072 does internally -- we are concerned here with the
> high write, low read volume case.
>

Requiring clients directly manage the counter rows in order to
periodically compress or segment them.  Yes, you can emulate the
behavior.  No, that is not handling it well.

>>  It is equivalent to "make the users do it", which
>> is the case for almost anything.
>
> I strongly feel we should be in the business of providing building
> blocks, not special cases on top of that.  (But see below, I *do*
> think the 580 version vectors is the kind of building block we want!)
>

I agree, 580 is really valuable and should be in.  The problem for
high write rate, distributed counters is the requirement of read
before write inherent in such vector-based approaches.  Am I missing
some aspect of 580 that precludes that?

>>  The reasons #1072 is so valuable:
>>
>> 1) Does not require _any_ user action.
>
> This can be addressed at the library level.  Just as our first stab at
> ZK integration was a rather clunky patch; "cages" is better.
>

Certainly, but it would be hard to argue (and I am not) that the
tightly synchronized behavior of ZK is a good match for Cassandra
(mixing in Paxos could make for some neat options, but that's another
debate...).

>> 2) Does not change the EC-centric model of Cassandra.
>
> It does, though.  1072 is *not* a version vector-based approach --
> that would be 580.  Read the 1072 design doc, if you haven't.  (Thanks
> to Kelvin for writing that up!)
>

Nor is Cassandra right now.  I know 1072 isn't vector based, and I
think that is in its favor _for this application_.

> I'm referring in particular to reads requiring CL.ALL.  (My
> understanding is that in the previous design, a "master" replica was
> chosen and was always written to first.)  Both of these break "the
> EC-centric model" and that is precisely the objection I made when I
> said "ConsistencyLevel is not respected."  I don't think this is
> fixable in the 1072 approach.  I would be thrilled to be wrong.
>

It is EC in that the total for a counter is unknown until resolved on
read.  Yes, it does not respect CL, but since it can only be used in 1
way, I don't see that as a disadvantage.

>>> The second is that the approach in 1072 resembles an entirely separate
>>> system that happens to use part of Cassandra infrastructure -- the
>>> thrift API, the MessagingService, the sstable format -- but isn't
>>> really part of it.  ConsistencyLevel is not respected, and special
>>> cases abound to weld things in that don't fit, e.g. the AES/Streaming
>>> business.
>>
>> Then let's find ways to make it as elegant as it can be.  Ultimately,
>> this functionality needs to be in Cassandra or users will simply
>> migrate someplace else for this extremely common use case.
>
> This is what I've been pushing for.  The version vector approach to
> counting (i.e. 580 as opposed to 1072) is exactly the more elegant,
> EC-centric approach that addresses a case that we *don't* currently
> handle well (counters with a higher read volume than 1072).
>

Perhaps I missed something: does counting 580 require read before
counter update (local to the node, not a client read)?


b

Mime
View raw message