incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milind Parikh <milindpar...@gmail.com>
Subject Re: rainbird question (why is the 1minute buffer needed?)
Date Sun, 22 May 2011 19:53:21 GMT
I believe that the key reason is souped up performance for most recent data.
And yes, "an intelligent flush" leaves you vulnerable to some data loss.

/***********************
sent from my android...please pardon occasional typos as I respond @ the
speed of thought
************************/

On May 22, 2011 11:01 AM, "Yang" <teddyyyy123@gmail.com> wrote:

Thanks,

I did read through that pdf doc, and went through the counters code in
0.8-rc2, I think I understand the logic in that code.

in my hypothetical implementation, I am not suggesting to overstep the
complicated logic in counters code, since the extra module will still
need to enter the increment through StorageProxy.mutate(
My_counter.delta=1 ) , so that the logical clock is still handled by
the Counters code.

 the only difference is, as you said,
that rainbird collapses many +1 deltas. but my claim is that in fact
this "collapsing" is already done by cassandra since the write always
hit the memtable  first,
so collapsing in Cassandra memtable vs collapsing in rainbird  memory
takes the same time, while rainbird introduces an extra level of
caching (I am strongly suspecting that rainbird is vulnerable to
losing up to 1minute's worth of data , if the rainbird dies before the
writes are flushed to cassandra ---- unless it does implement its own
commit log, but that is kind of  re-implementing many of the wheels in
Cassandra ....)


I thought at one time probably the reason was because that from one
given url, rainbird needs to create writes on many keys, so that they
keys need to go to different
Cassandra nodes. but later I found that this can also be done in a
module on the coordinator, since the client request first hits a
coordinator, instead of the data node, in fact, in a multi-insert
case, the coordinator already sends the request to multiple data
nodes. the extra module I am proposing simply translates a single
insert into multi-insert, and then cassandra takes over from there


Thanks
Yang


On Sun, May 22, 2011 at 3:47 AM, aaron morton <aaron@thelastpickle.com>
wrote:
>  The implementatio...

Mime
View raw message