incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guy Incognito <dnd1...@gmail.com>
Subject Re: best practices for simulating transactions in Cassandra
Date Sun, 11 Dec 2011 01:53:02 GMT
you could try writing with the clock of the initial replay entry?

On 06/12/2011 20:26, John Laban wrote:
> Ah, neat.  It is similar to what was proposed in (4) above with adding 
> transactions to Cages, but instead of snapshotting the data to be 
> rolled back (the "before" data), you snapshot the data to be replayed 
> (the "after" data).  And then later, if you find that the transaction 
> didn't complete, you just keep replaying the transaction until it takes.
>
> The part I don't understand with this approach though:  how do you 
> ensure that someone else didn't change the data between your initial 
> failed transaction and the later replaying of the transaction?  You 
> could get lost writes in that situation.
>
> Dominic (in the Cages blog post) explained a workaround with that for 
> his rollback proposal:  all subsequent readers or writers of that data 
> would have to check for abandoned transactions and roll them back 
> themselves before they could read the data.  I don't think this is 
> possible with the XACT_LOG "replay" approach in these slides though, 
> based on how the data is indexed (cassandra node token + timeUUID).
>
>
> PS:  How are you liking Cages?
>
>
>
>
> 2011/12/6 Jérémy SEVELLEC <jsevellec@gmail.com 
> <mailto:jsevellec@gmail.com>>
>
>     Hi John,
>
>     I had exactly the same reflexions.
>
>     I'm using zookeeper and cage to lock et isolate.
>
>     but how to rollback?
>     It's impossible so try replay!
>
>     the idea is explained in this presentation
>     http://www.slideshare.net/mattdennis/cassandra-data-modeling (starting
>     from slide 24)
>
>     - insert your whole data into one column
>     - make the job
>     - remove (or expire) your column.
>
>     if there is a problem during "making the job", you keep the
>     possibility to replay and replay and replay (synchronously or in a
>     batch).
>
>     Regards
>
>     Jérémy
>
>
>     2011/12/5 John Laban <john@pagerduty.com <mailto:john@pagerduty.com>>
>
>         Hello,
>
>         I'm building a system using Cassandra as a datastore and I
>         have a few places where I am need of transactions.
>
>         I'm using ZooKeeper to provide locking when I'm in need of
>         some concurrency control or isolation, so that solves that
>         half of the puzzle.
>
>         What I need now is to sometimes be able to get atomicity
>         across multiple writes by simulating the
>         "begin/rollback/commit" abilities of a relational DB.  In
>         other words, there are places where I need to perform multiple
>         updates/inserts, and if I fail partway through, I would
>         ideally be able to rollback the partially-applied updates.
>
>         Now, I *know* this isn't possible with Cassandra.  What I'm
>         looking for are all the best practices, or at least tips and
>         tricks, so that I can get around this limitation in Cassandra
>         and still maintain a consistent datastore.  (I am using quorum
>         reads/writes so that eventual consistency doesn't kick my ass
>         here as well.)
>
>         Below are some ideas I've been able to dig up.  Please let me
>         know if any of them don't make sense, or if there are better
>         approaches:
>
>
>         1) Updates to a row in a column family are atomic.  So try to
>         model your data so that you would only ever need to update a
>         single row in a single CF at once.  Essentially, you model
>         your data around transactions.  This is tricky but can
>         certainly be done in some situations.
>
>         2) If you are only dealing with multiple row *inserts* (and
>         not updates), have one of the rows act as a 'commit' by
>         essentially validating the presence of the other rows.  For
>         example, say you were performing an operation where you wanted
>         to create an Account row and 5 User rows all at once (this is
>         an unlikely example, but bear with me).  You could insert 5
>         rows into the Users CF, and then the 1 row into the Accounts
>         CF, which acts as the commit.  If something went wrong before
>         the Account could be created, any Users that had been created
>         so far would be orphaned and unusable, as your business logic
>         can ensure that they can't exist without an Account.  You
>         could also have an offline cleanup process that swept away
>         orphans.
>
>         3) Try to model your updates as idempotent column inserts
>         instead.  How do you model updates as inserts?  Instead of
>         munging the value directly, you could insert a column
>         containing the operation you want to perform (like "+5").  It
>         would work kind of like the Consistent Vote Counting
>         implementation: ( https://gist.github.com/416666 ).  How do
>         you make the inserts idempotent?  Make sure the column names
>         correspond to a request ID or some other identifier that would
>         be identical across re-drives of a given (perhaps originally
>         failed) request.  This could leave your datastore in a
>         temporarily inconsistent state, but would eventually become
>         consistent after a successful re-drive of the original request.
>
>         4) You could take an approach like Dominic Williams proposed
>         with Cages:
>         http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cages/
>            The gist is that you snapshot all the original values that
>         you're about to munge somewhere else (in his case, ZooKeeper),
>         make your updates, and then delete the snapshot (and that
>         delete needs to be atomic).  If the snapshot data was never
>         deleted, then subsequent accessors (even readers) of the data
>         rows need to do the rollback of the previous transaction
>         themselves before they can read/write this data.  They do the
>         rollback by just overwriting the current values with what is
>         in the snapshot.  It offloads the work of the rollback to the
>         next worker that accesses the data.  This approach probably
>         needs an generic/high-level programming layer to handle all of
>         the details and complexity, and it doesn't seem like it was
>         ever added to Cages.
>
>
>         Are there other approaches or best practices that I missed?  I
>         would be very interested in hearing any opinions from those
>         who have tackled these problems before.
>
>         Thanks!
>         John
>
>
>
>
>
>     -- 
>     Jérémy
>
>


Mime
View raw message