On Fri, Aug 13, 2010 at 6:24 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> This is simply not an acceptable alternative and just can't be called
>> handling it "well".
>
> What part is it handling poorly, at a technical level? This is almost
> exactly what 1072 does internally -- we are concerned here with the
> high write, low read volume case.
>
Requiring clients directly manage the counter rows in order to
periodically compress or segment them. Yes, you can emulate the
behavior. No, that is not handling it well.
>> It is equivalent to "make the users do it", which
>> is the case for almost anything.
>
> I strongly feel we should be in the business of providing building
> blocks, not special cases on top of that. (But see below, I *do*
> think the 580 version vectors is the kind of building block we want!)
>
I agree, 580 is really valuable and should be in. The problem for
high write rate, distributed counters is the requirement of read
before write inherent in such vector-based approaches. Am I missing
some aspect of 580 that precludes that?
>> The reasons #1072 is so valuable:
>>
>> 1) Does not require _any_ user action.
>
> This can be addressed at the library level. Just as our first stab at
> ZK integration was a rather clunky patch; "cages" is better.
>
Certainly, but it would be hard to argue (and I am not) that the
tightly synchronized behavior of ZK is a good match for Cassandra
(mixing in Paxos could make for some neat options, but that's another
debate...).
>> 2) Does not change the EC-centric model of Cassandra.
>
> It does, though. 1072 is *not* a version vector-based approach --
> that would be 580. Read the 1072 design doc, if you haven't. (Thanks
> to Kelvin for writing that up!)
>
Nor is Cassandra right now. I know 1072 isn't vector based, and I
think that is in its favor _for this application_.
> I'm referring in particular to reads requiring CL.ALL. (My
> understanding is that in the previous design, a "master" replica was
> chosen and was always written to first.) Both of these break "the
> EC-centric model" and that is precisely the objection I made when I
> said "ConsistencyLevel is not respected." I don't think this is
> fixable in the 1072 approach. I would be thrilled to be wrong.
>
It is EC in that the total for a counter is unknown until resolved on
read. Yes, it does not respect CL, but since it can only be used in 1
way, I don't see that as a disadvantage.
>>> The second is that the approach in 1072 resembles an entirely separate
>>> system that happens to use part of Cassandra infrastructure -- the
>>> thrift API, the MessagingService, the sstable format -- but isn't
>>> really part of it. ConsistencyLevel is not respected, and special
>>> cases abound to weld things in that don't fit, e.g. the AES/Streaming
>>> business.
>>
>> Then let's find ways to make it as elegant as it can be. Ultimately,
>> this functionality needs to be in Cassandra or users will simply
>> migrate someplace else for this extremely common use case.
>
> This is what I've been pushing for. The version vector approach to
> counting (i.e. 580 as opposed to 1072) is exactly the more elegant,
> EC-centric approach that addresses a case that we *don't* currently
> handle well (counters with a higher read volume than 1072).
>
Perhaps I missed something: does counting 580 require read before
counter update (local to the node, not a client read)?
b
|