cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darren Shepherd <>
Subject Re: [DISCUSS] Transaction Hell
Date Wed, 09 Oct 2013 17:29:07 GMT

>From a high level I think we'd probably agree.  Generally I feel an
IaaS platform is largely a metadata management framework that stores
the "desired" state of the infrastructure and then pro-actively tries
to reconcile the desired state with reality.  So failures should be
recovered from easily as inconsistency will be discovered and
reconciled.  Having sad that, ACS is not at all like that.  It is very
task oriented.  Hopefully I/we/everyone can change that, its a huge
concern of mine.  The general approach in ACS I see is do task X and
hopefully it works.  If it doesn't work, well hopefully we didn't
leave things in an inconsistent state.  If we find it does leave
things in an inconsistent state, write a cleanup thread to fix bad
things in bad states....

Regarding TX specifically.  This is a huge topic.  I really don't know
where to start.  I have so many complaints with the data access in
ACS.  There's what I'd like to see, but its so far from what it really
is.  Instead I'll address specifically your question.

I wish we were doing transaction per API, but I don't think that was
ever a consideration.  I do think the sync portion of API commands
should be wrapped in a single transaction.  I really think the
original intention of the Transaction framework was to assist in
cleaning up resources that people always forget to close.  I think
that is mostly it.

The general guidelines of how I'd like transactions to work would be

1) Synchronous portions of API commands are wrapped in a single
transaction.  Transaction propagation capability from spring tx can
then handle nesting transaction as more complicated transaction
management may be need in certain places.

2) Async jobs that run in a background threads should do small fine
grained transaction management.  Ideally no transactions.
Transactions should not be used as a locking mechanism.

Having said that, there are currently so many technical issues in
getting to that.  For example, with point 1, because IPC/MessageBus
and EventBus were added recently, that makes it difficult to do 1.
The problem is that you can't send a message while a DB tx is open
because the reciever may get the message before the commit.  So
messaging frameworks have to be written in consideration of the
transaction management.  Not saying you need to do complex XA style
transactions, there's simpler ways to do that.  So regarding points 1
and 2 I said.  That's what I'd like to see, but I know its a long road
to that.

Option B is really about introducing an API that will eventually serve
as a lightweight wrapper around Spring TX.  In the short term, if I do
option B, the implementation of the code will still be the custom ACS
TX mgmt.  So across modules, its sorta kinda works but not really.
But if I do the second step of replacing custom ACS TX impl with
Spring TX, it will follow how Spring TX works.  If we have Sprint TX
we can then leverage the transaction propagation features of it to
more sanely handle transaction nesting.

I feel I went a bit the weeds with that response, but maybe something
in there made sense.


On Wed, Oct 9, 2013 at 9:31 AM, Pedro Roque Marques
<> wrote:
> Darren,
> My assumption when I tried to make sense of the transaction code is that the underlying
motivation is that the code is trying to create a transaction per API call and then allow
multiple modules to implement that API call...
> i.e. the intent is do use a bit of what i would call a "web-server logic"...
> 1. API call starts.
> 2. Module X starts transaction...
> 3. Module Y does some other changes in the DB...
> 4. Either the API call completes successfully or not... commit or error back to the user.
>  I suspect that this was probably the starting point... but it doesn't really work as
i describe above. Often when the plugin i'm working on screws up (or XenServer is misconfigured)
one ends up with DB objects in inconsistent state.
> I suspect that the DB Transaction design needs to include what is the methodology for
the design of the management server.
> In an ideal world, i would say that API calls just check authorization and quotas and
should store the intent of the management server to reach the desired state. State machines
that can then deal with transient failures should then attempt to move the state of the system
to the state intended by the user. That however doesn't seem to reflect the current state
of the management server.
> I may be completely wrong... Can you give an example in proposal B of how a transaction
would span multiple modules of code ?
>   Pedro.
> On Oct 9, 2013, at 1:44 AM, Darren Shepherd wrote:
>> Okay, please read this all, this is important...  I want you all to
>> know that its personally important to me to attempt to get rid of ACS
>> custom stuff and introduce patterns, frameworks, libraries, etc that I
>> feel are more consistent with modern Java development and are
>> understood by a wider audience.  This is one of the basic reasons I
>> started the spring-modularization branch.  I just want to be able to
>> leverage Spring in a sane way.  The current implementation in ACS is
>> backwards and broken and abuses Spring to the point that leveraging
>> Spring isn't really all that possible.
>> So while I did the Spring work, I also started laying the ground work
>> to get rid of the ACS custom transaction management.  The custom DAO
>> framework and the corresponding transaction management has been a huge
>> barrier to me extending ACS in the past.  When you look at how you are
>> supposed to access the database, it's all very custom and what I feel
>> isn't really all that straight forward.  I was debugging an issue
>> today and figured out there is a huge bug in what I've done and that
>> has lead me down this rabbit hole of what the correct solution is.
>> Additionally ACS custom transaction mgmt is done in a way that
>> basically breaks Spring too.
>> At some point on the mailing list there was a small discussion about
>> removing the @DB interceptor.  The @DB interceptor does and
>> txn.close() around a method.  If a method forgets to commit or
>> rollback the txn, txn.close() will rollback the transaction for the
>> method.  So the general idea of the change was to instead move that
>> logic to the bottom of the call stack.  The assumption being that the
>> @DB code was just an additional check to ensure the programmer didn't
>> forget something and we could instead just do that once at the bottom
>> of the stack.  Oh how wrong I was.
>> The problem is that developers have relied on the @DB interceptor to
>> handle rollback for them.  So you see the following code quite a bit
>> txn.start()
>> ...
>> txn.commit()
>> And there is no sign of a rollback anywhere.  So the rollback will
>> happen if some exception is thrown.  By moving the @DB logic to the
>> bottom of stack what happens is the transaction is not rolled back
>> when the developer thought it would and madness ensues.  So that
>> change was bad.  So what to do....  Here's my totally bias description
>> of solutions:
>> Option A or "Custom Forever!":  Go back to custom ACS AOP and the @DB.
>> This is what one would think is the simplest and safest solution.
>> We'll it ain't really.  Here's the killer problem, besides that fact
>> that it makes me feel very sad inside, the current rollback behavior
>> is broken in certain spots in ACS.  While investigating possible
>> solutions I started looking at all the places that do programmatic txn
>> management.  It's important to realize that the txn framework only
>> works properly if the method in which you do txn.start() has @DB on
>> it.  There is a java assert in currentTxn() that attempts to make sure
>> that @DB is there.  But.... nobody runs with asserts on.  So there are
>> places in ACS where transactions are started and no @DB is there, but
>> it happens to work because some method in the stack has @DB.  So to
>> properly go back to option A we really need to fix all places that
>> don't have @DB, plus make sure people always run with asserts on.  And
>> then give up making the ACS world a better place and just do things
>> how we always have...
>> Option B or "Progress is Good":  The current transaction management
>> approach (especially rollback) doesn't match how the majority of
>> frameworks out there work.  This option is to change the Transaction
>> class API to be more consistent with standard frameworks out there.  I
>> propose the following APIs (if you know Spring TX mgmt, this will look
>> familiar)
>> 1) remove start(), commit(), rollback() - The easiest way to ensure we
>> up date everything properly is to break the API and fix everything
>> that is broken (about 433 places)
>> 2) Add execute(TransactionCallback) where TransactionCallback has one
>> method doInTransaction().  For somebody to run a transaction you would
>> need to do
>> txn.execute(new TransactionCallback() {
>> Object doInTransaction() {
>>  // do stuff
>> }
>> })
>> 3) add "Object startTransaction()," commit(Object), and
>> rollback(Object) - These methods are for callers who really really
>> want to do thing programmatically.  To run a transaction you would do
>> Object status = txn.startTransaction()
>> try {
>>  //.. do stuff
>>  txn.commit(status)
>> } catch (Exception e) {
>>  txn.rollback(status)
>> }
>> I'm perfectly willing to go and change all the code for this.  It will
>> just take a couple hours or so.  Option B is purposely almost exactly
>> like Spring PlatformTransactionManager.  The reason being if we switch
>> all the code to this style, we can later drop the implementation of
>> Transaction and move to 100% fully Spring TX managed.
>> Just as a final point, every custom approach or framework we have adds
>> a barrier to people extending ACS and additionally puts more burden on
>> the ACS community as that is more code we have to support.  If
>> somebody today wants to know how DB transaction propagation works,
>> there's zero documentation on it and probably 2 people who know how it
>> works.  If we use a standard framework then somebody can just refer to
>> that frameworks documentation, community, or stackexchange.
>> So either option we can do and I'm opening this up to the community to
>> decide.  If we go with Option A, a small portion of me will die
>> inside, but so be it.
>> Darren

View raw message