Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 147B86F6E for ; Wed, 22 Jun 2011 17:05:43 +0000 (UTC) Received: (qmail 79362 invoked by uid 500); 22 Jun 2011 17:05:40 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 79337 invoked by uid 500); 22 Jun 2011 17:05:40 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 79329 invoked by uid 99); 22 Jun 2011 17:05:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jun 2011 17:05:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dwilliams@system7.co.uk designates 209.85.210.172 as permitted sender) Received: from [209.85.210.172] (HELO mail-iy0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jun 2011 17:05:36 +0000 Received: by iye7 with SMTP id 7so1165579iye.31 for ; Wed, 22 Jun 2011 10:05:15 -0700 (PDT) Received: by 10.231.128.199 with SMTP id l7mr751933ibs.150.1308762315206; Wed, 22 Jun 2011 10:05:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.20.2 with HTTP; Wed, 22 Jun 2011 10:04:55 -0700 (PDT) In-Reply-To: References: From: Dominic Williams Date: Wed, 22 Jun 2011 18:04:55 +0100 Message-ID: Subject: Re: No Transactions: An Example To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001636920cede4af2204a64ff86d --001636920cede4af2204a64ff86d Content-Type: text/plain; charset=ISO-8859-1 Hi Trevor, I hope to post on my practical experiences in this area soon - we rely heavily on complex serialized operations in FightMyMonster.com. Probably the most simple serialized operation we do is updating nugget balances when, for example, there has been a trade of monsters. Currently we use ZooKeeper/Cages (github.com/s7) to serialize our distributed ops. We don't implement transactions with rollback/commit. Rather, we lock some paths, for example /Users/bank/dominic and /Users/bank/ben, and then write with QUORUM through our Java client library Pelops. This will make several efforts to retry the operation if it fails at first, and in our line of business the fact that redundancy in the cluster means it will nearly always complete eventually is enough. Of course, in a real world money scenario that is not enough and data inconsistency caused by, say, a sudden power outage during the retry phase is not acceptable. To handle this case I would like to extend Cages at some point so that commit/rollback transactions that would be stored inside ZooKeeper are associated with the distributed locks (which are stored persistently and survive power loss for example). There is an old blog post here which talks about it http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cages/although this needs updating. One interesting point not discussed which I have also not heard mentioned elsewhere is that in order for serialization to work every time, before you release a lock after performing an update you must wait for a brief period >= max variance between the clocks on the application nodes updating the database e.g. 1-2ms. This is because Cassandra uses the timestamps of columns that have been written during reconciliation to determine which should be persisted when they conflict. As far as scaling goes, ZooKeeper can be scaled by having several clusters and hashing lock paths to them. Alternatively, Lamport's bakery algorithm could be investigated as this shows you can have locking without a central coordinator service. Best, Dominic On 22 June 2011 15:18, Trevor Smith wrote: > Hello, > > I was wondering if anyone had architecture thoughts of creating a simple > bank account program that does not use transactions. I think creating an > example project like this would be a good thing to have for a lot of the > discussions that pop up about transactions and Cassandra (and > non-transactional datastores in general). > > Consider the simple system that has accounts, and users can transfer money > between the accounts. > > There are these interesting papers as background (links below). > > Thank you. > > Trevor Smith > > http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf > > > http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdf > > http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf > --001636920cede4af2204a64ff86d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Trevor,

I hope to post on my practical experiences in= this area soon - we rely heavily on complex serialized operations in Fight= MyMonster.com. Probably the most simple serialized operation we do is updat= ing nugget balances when, for example, there has been a trade of monsters.= =A0

Currently we use ZooKeeper/Cages (github.com/s7) to serialize our distributed ops.=A0

We don't implement transactions with rollback/commit.= Rather, we lock some paths, for example /Users/bank/dominic and /Users/ban= k/ben, and then write with QUORUM through our Java client library Pelops. T= his will make several efforts to retry the operation if it fails at first, = and in our line of business the fact that redundancy in the cluster means i= t will nearly always complete eventually is enough.

Of course, in a real world money scenario that is not e= nough and data inconsistency caused by, say, a sudden power outage during t= he retry phase is not acceptable. To handle this case I would like to exten= d Cages at some point so that commit/rollback transactions that would be st= ored inside ZooKeeper are associated with the distributed locks (which are = stored persistently and survive power loss for example). There is an old bl= og post here which talks about it http://ria10= 1.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-ca= ges/ although this needs updating.=A0

One interesting point not discussed which I have also n= ot=A0heard=A0mentioned elsewhere is that in order for serialization to work= every time, before you release a lock after performing an update you must = wait for a brief period >=3D max variance between the clocks on the appl= ication nodes updating the database e.g. 1-2ms.

This is because Cassandra uses the timestamps of column= s that have been written during reconciliation to determine which should be= persisted when they conflict.=A0

As far as scalin= g goes, ZooKeeper can be scaled by having several clusters and hashing lock= paths to them. Alternatively,=A0Lamport's bakery algorithm could be in= vestigated as this shows you can have locking without a central coordinator= service.=A0

Best, Dominic

On 22 J= une 2011 15:18, Trevor Smith <trevor@knewton.com> wrote:
Hello,

I was wondering if anyone had architecture though= ts of creating a simple bank account program that does not use transactions= . I think creating an example project like this would be a good thing to ha= ve for a lot of the discussions that pop up about transactions and Cassandr= a (and non-transactional datastores in general).=A0

Consider the simple system that has accounts, and users= can transfer money between the accounts.

There ar= e these interesting papers as background (links below).

Thank you.

Trevor Smith

<= div>http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf



--001636920cede4af2204a64ff86d--