db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brett Bergquist (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DERBY-6879) Engine deadlock between XA timeout handling and cleanupOnError
Date Mon, 27 Jun 2016 00:02:51 GMT

    [ https://issues.apache.org/jira/browse/DERBY-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347036#comment-15347036
] 

Brett Bergquist edited comment on DERBY-6879 at 6/27/16 12:02 AM:
------------------------------------------------------------------

I am questioning if this is the final correct answer.  In this one case, the cleanUpOnError
is being called by connection that is executing a statement so the locking is connection and
then XATransactionState.  When a timeout occurs, the locking is XATransactionState and then
the connection.  When both occur at the same time the deadlock occurs.  My proposed fix handles
this situation.

There is other possibilities.   So for example, code calls XAResource.commit.   This has the
locking pattern of XATransactionState and then  connection.   So if the timeout were to occur
during this time, then there would now be a deadlock here because the proposed solution is
going to lock the connection and then the XATransacttionState.

The original deadlock problem can only occur if the XATransactionState.cleanupOnError is called
while the connection is locked and executing and a cleanup needs to be performed.


was (Author: bbergquist):
I am questioning if this is the final correct answer.  In this one case, the cleanUpOnError
is being called by connection that is executing a statement so the locking is connection and
then XATransactionState.  When a timeout occurs, the locking is XATransactionState and then
the connection.  When both occur at the same time the deadlock occurs.  My proposed fix handles
this situation.

There is other possibilities.   So for example, code calls XAResource.commit.   This has the
locking pattern of XATransactionState and then  connection.   So if the timeout were to occur
during this time, then there would not be a deadlock here because the proposed solution is
going to lock the connection and then the XATransacttionState.

The original deadlock problem can only occur if the XATransactionState.cleanupOnError is called
while the connection is locked and executing and a cleanup needs to be performed.

> Engine deadlock between XA timeout handling and cleanupOnError
> --------------------------------------------------------------
>
>                 Key: DERBY-6879
>                 URL: https://issues.apache.org/jira/browse/DERBY-6879
>             Project: Derby
>          Issue Type: Bug
>          Components: Services
>    Affects Versions: 10.10.2.0
>         Environment: Solaris 10.5 on Oracle M5000 
>            Reporter: Brett Bergquist
>         Attachments: derby-6879-test.diff
>
>
> Deadlock between XA timer cleanup task and the ContextManager.cleanupOnError
> Found one Java-level deadlock:
> =============================
> "DRDAConnThread_34":
>   waiting to lock monitor 0x0000000104b14d18 (object 0xfffffffd9090f058, a org.apache.derby.jdbc.XATransactionState),
>   which is held by "Timer-0"
> "Timer-0":
>   waiting to lock monitor 0x00000001038b96e8 (object 0xfffffffd9090d8b0, a org.apache.derby.impl.jdbc.EmbedConnection40),
>   which is held by "DRDAConnThread_34"
>  
> Java stack information for the threads listed above:
> ===================================================
> "DRDAConnThread_34":
>      at org.apache.derby.jdbc.XATransactionState.cleanupOnError(Unknown Source)
>      - waiting to lock <0xfffffffd9090f058> (a org.apache.derby.jdbc.XATransactionState)
>      at org.apache.derby.iapi.services.context.ContextManager.cleanupOnError(Unknown
Source)
>      at org.apache.derby.impl.jdbc.TransactionResourceImpl.cleanupOnError(Unknown Source)
>      at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
>      at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
>      at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source)
>      at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source)
>      - locked <0xfffffffd9090d8b0> (a org.apache.derby.impl.jdbc.EmbedConnection40)
>      at org.apache.derby.impl.jdbc.EmbedPreparedStatement.executeStatement(Unknown Source)
>      at org.apache.derby.impl.jdbc.EmbedPreparedStatement.execute(Unknown Source)
>      at org.apache.derby.iapi.jdbc.BrokeredPreparedStatement.execute(Unknown Source)
>      at org.apache.derby.impl.drda.DRDAStatement.execute(Unknown Source)
>      at org.apache.derby.impl.drda.DRDAConnThread.parseEXCSQLSTTobjects(Unknown Source)
>      at org.apache.derby.impl.drda.DRDAConnThread.parseEXCSQLSTT(Unknown Source)
>      at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown Source)
>      at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
> "Timer-0":
>      at org.apache.derby.impl.jdbc.EmbedConnection.xa_rollback(Unknown Source)
>      - waiting to lock <0xfffffffd9090d8b0> (a org.apache.derby.impl.jdbc.EmbedConnection40)
>      at org.apache.derby.jdbc.XATransactionState.cancel(Unknown Source)
>      - locked <0xfffffffd9090f058> (a org.apache.derby.jdbc.XATransactionState)
>      at org.apache.derby.jdbc.XATransactionState$CancelXATransactionTask.run(Unknown
Source)
>      at java.util.TimerThread.mainLoop(Timer.java:555)
>      at java.util.TimerThread.run(Timer.java:505)
>  
> Found 1 deadlock.
> This deadlock caused Derby to create 18000 transaction recovery logs because of the XA
transaction that did not cleanup in the timeout.  Rebooting the system would cause a 50 hour
boot up time to process the transaction logs so recovery had to be done by going to a backup
database before the issue occurred.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message