Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A3163F21A for ; Thu, 4 Apr 2013 19:53:34 +0000 (UTC) Received: (qmail 35585 invoked by uid 500); 4 Apr 2013 19:53:32 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 35451 invoked by uid 500); 4 Apr 2013 19:53:32 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 35443 invoked by uid 99); 4 Apr 2013 19:53:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 19:53:32 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tivv00@gmail.com designates 209.85.220.170 as permitted sender) Received: from [209.85.220.170] (HELO mail-vc0-f170.google.com) (209.85.220.170) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 19:53:27 +0000 Received: by mail-vc0-f170.google.com with SMTP id lf11so929805vcb.29 for ; Thu, 04 Apr 2013 12:53:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=Ffesvwrx92vCMl483nRB5DI/m0fVSICNqPRGaTE9W4k=; b=JpwLrWwoidlPePaQC4/DdaGOXBNU5Hn2iTshHkoMfb6M/vrQfhaov0MLDPKoJkeMvN AdHjoFo5dCQZpn8AHoGmz6hW2SGUZTTR8iRh4i7yUA5y52yx/0Fo7U3nuwMb5berFRuZ Trzmi/V4gfBeukTU9XHcVFyLtRnNAIg8xNB6NKyHHPA8zYRwl79Etg3CFvfL7WbVs+zA KcxCvw5MYnG8v2fiGIQ7TPBHRG9W5AjYFK/wgCUc6s5jJ31uRkVEXxyx2PXmtI18rBRC 8AG/tJJiOFDC5Oq1xdpwo1X6qqR5klXslpPG2PjiEK8n17Nnw+T6ZRb2Jwer11piFB8g Te7g== MIME-Version: 1.0 X-Received: by 10.52.26.209 with SMTP id n17mr5168353vdg.26.1365105186941; Thu, 04 Apr 2013 12:53:06 -0700 (PDT) Received: by 10.220.247.134 with HTTP; Thu, 4 Apr 2013 12:53:06 -0700 (PDT) Received: by 10.220.247.134 with HTTP; Thu, 4 Apr 2013 12:53:06 -0700 (PDT) In-Reply-To: <9B65FDA2-7975-495A-B87F-5D0D013C9157@thelastpickle.com> References: <11C8B93C-00D6-4871-9050-1D710BA298C1@venarc.com> <9B65FDA2-7975-495A-B87F-5D0D013C9157@thelastpickle.com> Date: Thu, 4 Apr 2013 22:53:06 +0300 Message-ID: Subject: Re: Any plans for read-before-write update operations in CQL3? From: Vitalii Tymchyshyn To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf3079c100bfc83d04d98e518f X-Virus-Checked: Checked by ClamAV on apache.org --20cf3079c100bfc83d04d98e518f Content-Type: text/plain; charset=KOI8-U Content-Transfer-Encoding: quoted-printable Well, a schema've just came to my mind, that looks interesting, so I want to share: 1) Actions are introduced. Each action receives unique I'd at coordinator node. Client can ask for a block of ids beforehand, to make actions idempotent. 2) Actions are applied to given row+column value. It's possible that special column family type should be created that support actions. 3) Actions are stored for grace period to ensure repair will be working well. 4) Along with all the actions for grace period, old value, current value and old value hash is stored. 5) Old value is the value without currently stored actions, current value has all currently stored actions applied 6) Old value hash has number of actions applied, time of last action applied and hash of all the applied actions ids (only actions applied to old value of course). 7) Current value is updated on read. So there can be actions that are not applied yet. So on read, if there are unapplied actions, they are applied and information about current value/applied actions is updated. 8) Actions can rely on order or not rely on order. If actions rely on order and during update it is needed to apply out of order action, value is recalculated, starting from old value. 9) During repair, highest (based on number of actions applied, then lowest by time) old value is selected. Then all actions older or of the same time of old value are dropped as already applied. Newer are merged into union set. 10) During compaction, old value is moved to the now-grace period time. The schema looks solid. Minus is that all the values for grace period must be stored. May be it should be combined with some auto confirmation mechanism when coordinator, after receiving acks for all the writes does the second round notifying that action is fully written. This should work for hinted handoff too. Than, old value can be propagated to the last acked action. 4 =CB=D7=A6=D4. 2013 04:59, "aaron morton" =CE=C1= =D0=C9=D3. > > I would guess not. > >> I know this goes against keeping updates idempotent, > > There are also issues with consistency. i.e. is the read local or does it happen at the CL level ? > And it makes things go slower. > >> We currently do things like this in client code, but it would be great to be able to this on the server side to minimize the chance of race conditions. > > Sometimes you can write the plus one into a new column and then apply the changes in the reading client thread. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 4/04/2013, at 12:48 AM, Drew Kutcharian wrote: > >> Hi Guys, >> >> Are there any short/long term plans to support UPDATE operations that require read-before-write, such as increment on a numeric non-counter column? >> i.e. >> >> UPDATE CF SET NON_COUNTER_NUMERIC_COLUMN =3D NON_COUNTER_NUMERIC_COLUMN = + 1; >> >> UPDATE CF SET STRING_COLUMN =3D STRING_COLUMN + "postfix"; >> >> etc. >> >> I know this goes against keeping updates idempotent, but there are times you need to do these kinds of operations. We currently do things like this in client code, but it would be great to be able to this on the server side to minimize the chance of race conditions. >> >> -- Drew > > --20cf3079c100bfc83d04d98e518f Content-Type: text/html; charset=KOI8-U Content-Transfer-Encoding: quoted-printable

Well, a schema've just came to my mind, that looks inter= esting, so I want to share:
1) Actions are introduced. Each action receives unique I'd at coordinat= or node. Client can ask for a block of ids beforehand, to make actions idem= potent.
2) Actions are applied to given row+column value. It's possible that sp= ecial column family type should be created that support actions.
3) Actions are stored for grace period to ensure repair will be working wel= l.
4) Along with all the actions for grace period, old value, current value an= d old value hash is stored.
5) Old value is the value without currently stored actions, current value h= as all currently stored actions applied
6) Old value hash has number of actions applied, time of last action applie= d and hash of all the applied actions ids=9A (only actions applied to old v= alue of course).
7) Current value is updated on read. So there can be actions that are not a= pplied yet. So on read, if there are unapplied actions, they are applied an= d information about current value/applied actions is updated.
8) Actions can rely on order or not rely on order. If actions rely on order= and during update it is needed to apply out of order action, value is reca= lculated, starting from old value.
9) During repair, highest (based on number of actions applied, then lowest = by time) old value is selected. Then all actions older or of the same time = of old value are dropped as already applied. Newer are merged into union se= t.
10) During compaction, old value is moved to the now-grace period time.
The schema looks solid. Minus is that all the values for grace period must = be stored. May be it should be combined with some auto confirmation mechani= sm when coordinator, after receiving acks for all the writes does the secon= d round notifying that action is fully written. This should work for hinted= handoff too. Than, old value can be propagated to the last acked action.

4 =CB=D7=A6=D4. 2013 04:59, "aaron morton" <aaron@thelastpickle.com> =CE= =C1=D0=C9=D3.
>
> I would guess not.=9A
>
>> I know this goes against keeping updates idempotent,=9A
>
> There are also issues with consistency. i.e. is the read local or does= it happen at the CL level ?=9A
> And it makes things go slower.
>
>> =9AWe currently do things like this in client code, but it would b= e great to be able to this on the server side to minimize the chance of rac= e conditions.
>
> Sometimes you can write the plus one into a new column and then apply = the changes in the reading client thread.=9A
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com<= /a>
>
> On 4/04/2013, at 12:48 AM, Drew Kutcharian <
drew@venarc.com> wrote:
>
>> Hi Guys,
>>
>> Are there any short/long term plans to support UPDATE operations t= hat require read-before-write, such as increment on a numeric non-counter c= olumn?
>> i.e.
>>
>> UPDATE CF SET NON_COUNTER_NUMERIC_COLUMN =3D NON_COUNTER_NUMERIC_C= OLUMN + 1;
>>
>> UPDATE CF SET STRING_COLUMN =3D STRING_COLUMN + "postfix"= ;;
>>
>> etc.
>>
>> I know this goes against keeping updates idempotent, but there are= times you need to do these kinds of operations. We currently do things lik= e this in client code, but it would be great to be able to this on the serv= er side to minimize the chance of race conditions.
>>
>> -- Drew
>
>

--20cf3079c100bfc83d04d98e518f--