db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kristian Waagan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-4137) OOM issue using XA with timeouts
Date Wed, 08 Jun 2011 11:35:58 GMT

    [ https://issues.apache.org/jira/browse/DERBY-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045908#comment-13045908
] 

Kristian Waagan commented on DERBY-4137:
----------------------------------------

> Does the OOME happen because there are too many CancelXATransactionTask objects, or because
the CancelXATransactionTask objects reference too many other objects via the XATransactionState?

Both, but the primary factor is the objects referenced via XATransactionState.

* Current (top three at OOME):
class [B 	91172 	7859790
class org.apache.derby.jdbc.XATransactionState 	43581 	1830402
class org.apache.derby.jdbc.XATransactionState$CancelXATransactionTask 	43581 	1220268

* Nullifying reference to XATransactionState (top three at OOME)
class org.apache.derby.jdbc.XATransactionState$CancelXATransactionTask 	261134 	7311752
class [B 	4010 	1584124
class [Ljava.util.TimerTask; 	1 	1048584 

The memory requirements are decided by the transaction rate (# of objects) and the timeout
value (lifetime of objects).
CancelXATransactionTask seems to occupy 28 bytes, which is rather small. I have no idea what
kind of transaction rates and timeout values people are using, but maybe the alternative solution
you are proposing will be good enough for most use cases?

I'll write a patch for it.

> OOM issue using XA with timeouts
> --------------------------------
>
>                 Key: DERBY-4137
>                 URL: https://issues.apache.org/jira/browse/DERBY-4137
>             Project: Derby
>          Issue Type: Bug
>          Components: JDBC
>    Affects Versions: 10.4.2.0
>            Reporter: Ronald Tschalaer
>            Assignee: Kristian Waagan
>              Labels: derby_triage10_5_2
>         Attachments: derby-4137-1a-purge_on_cancel.diff, derby-4137-1a-purge_on_cancel.stat,
derby-4137-1b-purge_on_cancel.diff
>
>
> When using JTA for transaction control and a transaction timeout is set,
> EmbedXAResource ends up calling XATransactionState.scheduleTimeoutTask() which
> in turn registers a timeoutTask with java.util.Timer. In the normal case where
> the transaction finishes before the timeout, XATransactionState.xa_finalize()
> then calls timeoutTask.cancel(). So far this so good. The problem, however, is
> that java.util.TimerTask.cancel() does not actually remove the task from the
> timer queue, meaning that a strong reference to the timeoutTask is kept (and
> through that to XATransactionState, the EmbedConnection, etc). The reference
> is not removed until the time at which the timeout would have fired, which can
> be a long time. Under load this can quickly lead to an OOM situation.
> A simple fix is to call Timer.purge() every so often. While the javadocs talk
> about purge() being rarely needed and that it's not extremely cheap, I've
> found that calling it after every cancel() is the best approach, for several
> reasons: 1) the scenario here is that almost all tasks are cancelled, and
> hence this somewhat fits the Timer.purge() description of an "application that
> cancels a large number of tasks"; 2) there usually isn't a very large number
> of simultaneous transactions, and hence purge() is actually quite cheap; 3)
> this ensures the strong reference is immediately removed, allowing the GC to
> do a better job. Interestingly enough, I've had this exact same issue on a
> different type of db, and I had tested the purge() there and found it to be in
> the sub-microsecond range for 100 transactions (or similar - I don't recall
> the exact data), i.e. completely negligible.
> In short, my suggestion is to change xa_finalize as follows:
>     synchronized void xa_finalize() {
>         if (timeoutTask != null) {
>             timeoutTask.cancel();
>             Monitor.getMonitor().getTimerFactory().
>                     getCancellationTimer().purge();
>         }
>         isFinished = true;
>     }
> As a temporary workaround, applications can do this themselves, i.e.
> add something like the following whenever they close a Connection:
>   import org.apache.derby.iapi.services.monitor.Monitor;
>   Monitor.getMonitor().getTimerFactory().getCancellationTimer().purge();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message