Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of dwilliams@system7.co.uk
 designates 209.85.210.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <BANLkTi=MVU68oFDskXrkwDjDqj47QLxBfg@mail.gmail.com>
References: <BANLkTi=MVU68oFDskXrkwDjDqj47QLxBfg@mail.gmail.com>
From: Dominic Williams <dwilliams@system7.co.uk>
Date: Wed, 22 Jun 2011 18:04:55 +0100
Message-ID: <BANLkTinz7zWDO93RETNY0R_F3vvSryoZKw@mail.gmail.com>
Subject: Re: No Transactions: An Example
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001636920cede4af2204a64ff86d

--001636920cede4af2204a64ff86d
Content-Type: text/plain; charset=ISO-8859-1

Hi Trevor,

I hope to post on my practical experiences in this area soon - we rely
heavily on complex serialized operations in FightMyMonster.com. Probably the
most simple serialized operation we do is updating nugget balances when, for
example, there has been a trade of monsters.

Currently we use ZooKeeper/Cages (github.com/s7) to serialize our
distributed ops.

We don't implement transactions with rollback/commit. Rather, we lock some
paths, for example /Users/bank/dominic and /Users/bank/ben, and then write
with QUORUM through our Java client library Pelops. This will make several
efforts to retry the operation if it fails at first, and in our line of
business the fact that redundancy in the cluster means it will nearly always
complete eventually is enough.

Of course, in a real world money scenario that is not enough and data
inconsistency caused by, say, a sudden power outage during the retry phase
is not acceptable. To handle this case I would like to extend Cages at some
point so that commit/rollback transactions that would be stored inside
ZooKeeper are associated with the distributed locks (which are stored
persistently and survive power loss for example). There is an old blog post
here which talks about it
http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cages/although
this needs updating.

One interesting point not discussed which I have also not heard mentioned
elsewhere is that in order for serialization to work every time, before you
release a lock after performing an update you must wait for a brief period
>= max variance between the clocks on the application nodes updating the
database e.g. 1-2ms.

This is because Cassandra uses the timestamps of columns that have been
written during reconciliation to determine which should be persisted when
they conflict.

As far as scaling goes, ZooKeeper can be scaled by having several clusters
and hashing lock paths to them. Alternatively, Lamport's bakery algorithm
could be investigated as this shows you can have locking without a central
coordinator service.

Best, Dominic

On 22 June 2011 15:18, Trevor Smith <trevor@knewton.com> wrote:

> Hello,
>
> I was wondering if anyone had architecture thoughts of creating a simple
> bank account program that does not use transactions. I think creating an
> example project like this would be a good thing to have for a lot of the
> discussions that pop up about transactions and Cassandra (and
> non-transactional datastores in general).
>
> Consider the simple system that has accounts, and users can transfer money
> between the accounts.
>
> There are these interesting papers as background (links below).
>
>  Thank you.
>
> Trevor Smith
>
> http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf
>
>
> http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdf
>
> http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>

--001636920cede4af2204a64ff86d
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Trevor,<div><br></div><div>I hope to post on my practical experiences in=
 this area soon - we rely heavily on complex serialized operations in Fight=
MyMonster.com. Probably the most simple serialized operation we do is updat=
ing nugget balances when, for example, there has been a trade of monsters.=
=A0</div>

<div><br></div><div>Currently we use ZooKeeper/Cages (<a href=3D"http://git=
hub.com/s7">github.com/s7</a>) to serialize our distributed ops.=A0</div><d=
iv><br></div><div>We don&#39;t implement transactions with rollback/commit.=
 Rather, we lock some paths, for example /Users/bank/dominic and /Users/ban=
k/ben, and then write with QUORUM through our Java client library Pelops. T=
his will make several efforts to retry the operation if it fails at first, =
and in our line of business the fact that redundancy in the cluster means i=
t will nearly always complete eventually is enough.</div>

<div><br></div><div>Of course, in a real world money scenario that is not e=
nough and data inconsistency caused by, say, a sudden power outage during t=
he retry phase is not acceptable. To handle this case I would like to exten=
d Cages at some point so that commit/rollback transactions that would be st=
ored inside ZooKeeper are associated with the distributed locks (which are =
stored persistently and survive power loss for example). There is an old bl=
og post here which talks about it <a href=3D"http://ria101.wordpress.com/20=
10/05/12/locking-and-transactions-over-cassandra-using-cages/">http://ria10=
1.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-ca=
ges/</a> although this needs updating.=A0</div>

<div><br></div><div>One interesting point not discussed which I have also n=
ot=A0heard=A0mentioned elsewhere is that in order for serialization to work=
 every time, before you release a lock after performing an update you must =
wait for a brief period &gt;=3D max variance between the clocks on the appl=
ication nodes updating the database e.g. 1-2ms.</div>

<div><br></div><div>This is because Cassandra uses the timestamps of column=
s that have been written during reconciliation to determine which should be=
 persisted when they conflict.=A0</div><div><br></div><div>As far as scalin=
g goes, ZooKeeper can be scaled by having several clusters and hashing lock=
 paths to them. Alternatively,=A0Lamport&#39;s bakery algorithm could be in=
vestigated as this shows you can have locking without a central coordinator=
 service.=A0</div>

<div><br></div><div>Best, Dominic<br><br><div class=3D"gmail_quote">On 22 J=
une 2011 15:18, Trevor Smith <span dir=3D"ltr">&lt;<a href=3D"mailto:trevor=
@knewton.com">trevor@knewton.com</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex;">

Hello,<div><br></div><div>I was wondering if anyone had architecture though=
ts of creating a simple bank account program that does not use transactions=
. I think creating an example project like this would be a good thing to ha=
ve for a lot of the discussions that pop up about transactions and Cassandr=
a (and non-transactional datastores in general).=A0</div>


<div><br></div><div>Consider the simple system that has accounts, and users=
 can transfer money between the accounts.</div><div><br></div><div>There ar=
e these interesting papers as background (links below).</div><div><br>

</div>
<div>Thank you.</div><div><br></div><div>Trevor Smith</div><div><br></div><=
div><a href=3D"http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf" target=
=3D"_blank">http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf</a></div><di=
v>

<br></div><div><a href=3D"http://blogs.msdn.com/cfs-file.ashx/__key/communi=
tyserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00=
_V3_2D00_081212h_2D00_pdf.pdf" target=3D"_blank">http://blogs.msdn.com/cfs-=
file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/B=
uildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdf</a></div>


<div><br></div><div><a href=3D"http://www.cidrdb.org/cidr2011/Papers/CIDR11=
_Paper32.pdf" target=3D"_blank">http://www.cidrdb.org/cidr2011/Papers/CIDR1=
1_Paper32.pdf</a></div>
</blockquote></div><br></div>

--001636920cede4af2204a64ff86d--