db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brett Bergquist (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs
Date Thu, 29 Dec 2011 18:03:31 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177292#comment-13177292

Brett Bergquist commented on DERBY-5552:

I have found the cause of the problem.  When a lock timeout or deadlock is detected, the server
calls XATransactionState.cleanupOnError.   This looks like:

	public void cleanupOnError(Throwable t) {

		if (t instanceof StandardException) {

			StandardException se = (StandardException) t;
            if (se.getSeverity() >= ExceptionSeverity.SESSION_SEVERITY) {

			if (se.getSeverity() == ExceptionSeverity.TRANSACTION_SEVERITY) {

				synchronized (this) {
					// disable use of the connection until it is cleaned up.
					associationState = TRO_FAIL;
					if (SQLState.DEADLOCK.equals(se.getMessageId()))
						rollbackOnlyCode = XAException.XA_RBDEADLOCK;
					else if (SQLState.LOCK_TIMEOUT.equals(se.getMessageId()))
						rollbackOnlyCode = XAException.XA_RBTIMEOUT;					
						rollbackOnlyCode = XAException.XA_RBOTHER;

The problem is the line of code:


The problem that occurs is on the client side, when the SQLException is received, it ends
up calling Sqlca.getMessage() to retrieve the formatted exception message.  This makes a call
back down to the server on the connection and ends up calling EmbedStatement.checkStatus()
and now the EmbedConnection has a null "applicationConnection" and a noCurrentConnection is
throw.   DRDA code that receives this exception in processing of Sqlca.getMessage() determines
that there is a protocol error and disconnects from the server.

The XA transaction that was in process never has "end" called on it and the XA transaction
on the client side is now lost.  Derby now has a XA transaction that will never end causing
all kinds of havoc such as logging all new transactions in case the one lost ever does get
rolled back.  The file system fill up with transaction logs, restarting the database engine
takes days, etc.

I have commented out the above line and now the proper lock error is actually reported at
the client.  I don't know if there are any ramifications of doing so at this point however.

> Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs
> -----------------------------------------------------------------------------------------
>                 Key: DERBY-5552
>                 URL: https://issues.apache.org/jira/browse/DERBY-5552
>             Project: Derby
>          Issue Type: Bug
>          Components: Network Server
>    Affects Versions:
>         Environment: Solaris 10, Glassfish V2.1.1,
>            Reporter: Brett Bergquist
>            Priority: Blocker
>         Attachments: appserverstack.txt, client.tar.Z, derby.log, derbystackatshutdown.txt,
execute.patch, transactionsleft.txt
> The issue arrives when multiple XA transactions are done in parallel and there is either
a lock timeout or a lock deadlock detected.  When this happens the connection is leaked in
the Glassfish connection pool and the client thread hangs in "org.apache.derby.client.netReply.fill(Reply.java:172)".
> Shutting down the app server fails because the thread has a lock in "org.apache.derby.client.net.NetConnection40"
and another task is calling "org.apache.derby.client.ClientPooledConnection.close(ClientPooledConnection.java:214)"
which is waiting for the lock.
> Killing the appsever using "kill" and then attempting to shutdown Derby network server
causes the Network Server to hang.  One of the threads hangs waiting for a lock at "org.apache.derby.impl.drda.NeworkServerControlImpl.removeFromSessionTable(NetworkServerControlImpl.java:1525)"
and the "main" thread has this locked at "org.apache.derby.impl.drda.NetworkServerControlImpl.executeWork(NetworkServerControlImpl.java:2242)"
and it itself is waiting for a lock which belongs to a thread that is stuck at "org.apache.derby.impl.services.locks.ActiveLock.waitForGrant(ActiveLock.java:118)
which is in the TIMED_WAITING state.
> Only by killing the Network Server using "kill" is possible at this point.
> There are transactions left even though all clients have been removed.  

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message