Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CAF6B6181 for ; Sun, 22 May 2011 19:53:50 +0000 (UTC) Received: (qmail 45912 invoked by uid 500); 22 May 2011 19:53:48 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 45860 invoked by uid 500); 22 May 2011 19:53:48 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 45852 invoked by uid 99); 22 May 2011 19:53:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 22 May 2011 19:53:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of milindparikh@gmail.com designates 209.85.215.44 as permitted sender) Received: from [209.85.215.44] (HELO mail-ew0-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 22 May 2011 19:53:43 +0000 Received: by ewy19 with SMTP id 19so2132191ewy.31 for ; Sun, 22 May 2011 12:53:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=+xp5rMqzfrdviMTR7toWcEy2okbVFedtD+gu+gDBxCw=; b=V5aM9flOh9/SQiVtmDdAhkaV1z2ufh/2IE/fHuRNG3xddnWlZQ7A/JPJ+qXZ3xKYS8 T4gGgoPofzOOBIfkGruogLhnAcy8ovq5ivzfx7HGk5QcHYUjcJIcfBDHXD3wk0jsslYd pBbV5z1mJnEIn8Rqg7EOTL8yyPaSy1SH1jDsM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=kgdYtasPaYOMddf1zeeEMcbf7a9P3MoLrIPdGiHDVJt5wa0cFJg3+oWUN2tF4h9M4P fMrMTjPWpT+bcs42tM+bs+27KjOi2tQqSZO6wY8iYEwgrAOcJPeCZALvXkX3BH3/U5yI PtIm+BrdyqVr4dzCrpe3EXPbrh8sCUV7Ow3Bo= MIME-Version: 1.0 Received: by 10.14.37.141 with SMTP id y13mr551075eea.212.1306094001908; Sun, 22 May 2011 12:53:21 -0700 (PDT) Received: by 10.14.127.78 with HTTP; Sun, 22 May 2011 12:53:21 -0700 (PDT) Received: by 10.14.127.78 with HTTP; Sun, 22 May 2011 12:53:21 -0700 (PDT) In-Reply-To: References: <8BEFDE0B-9EA3-44B2-A9B7-E077F138AEBD@thelastpickle.com> Date: Sun, 22 May 2011 15:53:21 -0400 Message-ID: Subject: Re: rainbird question (why is the 1minute buffer needed?) From: Milind Parikh To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=90e6ba5bbaf706ec5a04a3e2b5e2 --90e6ba5bbaf706ec5a04a3e2b5e2 Content-Type: text/plain; charset=ISO-8859-1 I believe that the key reason is souped up performance for most recent data. And yes, "an intelligent flush" leaves you vulnerable to some data loss. /*********************** sent from my android...please pardon occasional typos as I respond @ the speed of thought ************************/ On May 22, 2011 11:01 AM, "Yang" wrote: Thanks, I did read through that pdf doc, and went through the counters code in 0.8-rc2, I think I understand the logic in that code. in my hypothetical implementation, I am not suggesting to overstep the complicated logic in counters code, since the extra module will still need to enter the increment through StorageProxy.mutate( My_counter.delta=1 ) , so that the logical clock is still handled by the Counters code. the only difference is, as you said, that rainbird collapses many +1 deltas. but my claim is that in fact this "collapsing" is already done by cassandra since the write always hit the memtable first, so collapsing in Cassandra memtable vs collapsing in rainbird memory takes the same time, while rainbird introduces an extra level of caching (I am strongly suspecting that rainbird is vulnerable to losing up to 1minute's worth of data , if the rainbird dies before the writes are flushed to cassandra ---- unless it does implement its own commit log, but that is kind of re-implementing many of the wheels in Cassandra ....) I thought at one time probably the reason was because that from one given url, rainbird needs to create writes on many keys, so that they keys need to go to different Cassandra nodes. but later I found that this can also be done in a module on the coordinator, since the client request first hits a coordinator, instead of the data node, in fact, in a multi-insert case, the coordinator already sends the request to multiple data nodes. the extra module I am proposing simply translates a single insert into multi-insert, and then cassandra takes over from there Thanks Yang On Sun, May 22, 2011 at 3:47 AM, aaron morton wrote: > The implementatio... --90e6ba5bbaf706ec5a04a3e2b5e2 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

I believe that the key reason is souped up performance for most recent d= ata.
And yes, "an intelligent flush" leaves you vulnerable to some dat= a loss.

/***********************
sent from my android...please pardon occasional typos as I respond @ the sp= eed of thought
************************/

On May 22, 2011 11:01 AM, "Yang" <= ;teddyyyy123@gmail.com> wro= te:

Thanks,

I did read through that pdf doc, and went through the counters code in
0.8-rc2, I think I understand the logic in that code.

in my hypothetical implementation, I am not suggesting to overstep the
complicated logic in counters code, since the extra module will still
need to enter the increment through StorageProxy.mutate(
My_counter.delta=3D1 ) , so that the logical clock is still handled by
the Counters code.

=A0the only difference is, as you said,
that rainbird collapses many +1 deltas. but my claim is that in fact
this "collapsing" is already done by cassandra since the write al= ways
hit the memtable =A0first,
so collapsing in Cassandra memtable vs collapsing in rainbird =A0memory
takes the same time, while rainbird introduces an extra level of
caching (I am strongly suspecting that rainbird is vulnerable to
losing up to 1minute's worth of data , if the rainbird dies before the<= br> writes are flushed to cassandra ---- unless it does implement its own
commit log, but that is kind of =A0re-implementing many of the wheels in Cassandra ....)


I thought at one time probably the reason was because that from one
given url, rainbird needs to create writes on many keys, so that they
keys need to go to different
Cassandra nodes. but later I found that this can also be done in a
module on the coordinator, since the client request first hits a
coordinator, instead of the data node, in fact, in a multi-insert
case, the coordinator already sends the request to multiple data
nodes. the extra module I am proposing simply translates a single
insert into multi-insert, and then cassandra takes over from there


Thanks
Yang


On Sun, May 22, 2011 at 3:47 AM, aaro= n morton <aaron@thelastpickle= .com> wrote:
> =A0The implementatio...

=

--90e6ba5bbaf706ec5a04a3e2b5e2--