cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus <shadow...@gmail.com>
Subject Re: [DESIGN] No agent calls within database transactions....
Date Mon, 31 Mar 2014 22:15:14 GMT
Yeah, that assert issue has bitten us once or twice, and I know Ryan
squawked about it at some point.  Do we have any point where
enforcement will occur (BVT or some other tests)?

On Mon, Mar 31, 2014 at 4:04 PM, Alex Huang <Alex.Huang@citrix.com> wrote:
> Hi All,
>
> I was alerted to this problem recently and it's something that affects developers so
I want to bring it up.  It is a design principle in CloudStack that we do not make agent calls
within database transactions.  The reason is because when you make a call to an external system,
there's no guarantee on how long the call takes or even whether the call returns.  When a
call takes a long time, several bad things can happen:
>         - The MySQL DB Connection held opened due to the DB transaction goes into idle.
Eventually, a timeout in MySQL hits and the connection gets severed and the transaction is
rolled back.  By default, this timeout is 45 seconds but can be changed via a parameter in
my.cnf.  So it's problem that the agent call completes just fine but the DB transaction rolls
back and changes are undone.
>         - The rows locked in that transaction before the remote agent call could be holding
up other foreign key checks into the table.  MySQL runs foreign key checks in transactions
to make sure the data modification and the checks are done atomically.  Therefore, these checks
must wait for other transactions to complete.  Hence, an agent call that takes sometime can
severely slow down the system, particularly under scale.
>
> We have two solutions to this:
>         - Drive agent interactions with states.  There are many examples of this in VM,
Volume, etc.
>         - When the above cannot be done, acquire a lock in the lock table via a DAO method
call.  Locks do not maintain DB transactions and therefore will not run into this problem.
 However, you are responsible for releasing locks.  It used to be that if you forget to release
the locks, the @DB annotation automatically releases locks once it went out of the scope and
asserts to alert the developer.  However, the @DB annotation has been removed in the Spring
work so I'm not sure if it's still done.
>
> This is a tough problem to solve because
>         1. It usually works just fine during functional testing.  During scale testing,
this problem surfaces and often in unexpected places due to the foreign key check problem.
>         2. For developers, it is difficult for them to know if a method that they're
calling within a transaction ends up in an agent call.
>
> There is an assert in AgentManager to ensure that there are no db transactions before
making a agent call.  Apparently, since the conversion to Maven, no one actually runs with
assert on any more.  Due to that, this design principle has been lost in CloudStack and we're
finding more and more calls being made in DB transactions.   To counter that, I decided to
add a global parameter that turns the assert to an actual exception.  It is advised that all
developers set this global parameter, check.txn.before.sending.agent.commands, during their
own testing to make sure it doesn't call agent calls in transactions.
>
> --Alex
>
>

Mime
View raw message