db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bergquist, Brett" <BBergqu...@canoga.com>
Subject RE: Problem with a deadlock with Derby and Glassfish V2.1.1
Date Wed, 21 Dec 2011 23:14:10 GMT
Will get to this tomorrow but I do see one comment in the code that I don't understand:

In DRDAConnThread.java, I see:

                                if (severity > CodePoint.SVRCOD_ERROR)
                                                // For a session ending error > CodePoint.SRVCOD_ERROR
you cannot
                                                // send a SQLERRRM. A CMDCHKRM is required.
 In XA if there is a
                                                // lock timeout it ends the whole session.
I am not sure this
                                                // is the correct behaviour but if it occurs
we have to send
                                                // a CMDCHKRM instead of SQLERRM

So what does the comment "In XA if there is a lock timeout it ends the whole session" refer
to.  Why would a lock timeout be any different than any other standard database error.  It
is like this is hinting at what is happening.

This is a real XA transaction.

What I see is that after the timeout is hit (I see it hit in Timeout.java) the error is propagated
to the app server.  The app server then attempts to get the error text (I don't have the code
handy) which attempts to send a request back to the Derby.  This then fails with a No Connection
error being returned back from Derby.  It is as if after this error, the connection between
the app server and Derby is no longer once there this is hit.

I am going to go try to follow through the code and get a smaller reproducible example.

From: Katherine Marsden [mailto:kmarsdenderby@sbcglobal.net]
Sent: Wednesday, December 21, 2011 3:41 PM
To: derby-dev@db.apache.org
Subject: Re: Problem with a deadlock with Derby and Glassfish V2.1.1

On 12/21/2011 12:04 PM, Bergquist, Brett wrote:
Nothing in the Derby log other than it logging a deadlock with the statements and a lock timeout
with its statements and it indicating that cleanup had started and completed.

I will enable tracing with the documented (undocumented system property).  Thanks for that

I will check for the XA transactions the next time I reproduce this.

Maybe you could point me into the correct area to look.  This seems to be triggered either
through a lock timeout or a deadlock.   The connection that this is occurring through is an
XA connection.   I see the logging of this in the server log but I am trying to find out where
that would be logged from.   It seems after this occurs and because of the way connection
pool is being validated and recreated on error by Glassfish (configured to do so), it gets
into this state.  What I don't understand is why this type of error would cause the connection
to appear to be invalid and I am trying to work through both the Glassfish source and the
Derby source to find out.   The connection is correctly handling other errors such as a duplication
trying to be inserted and this does not trigger the connection to appear to be invalid.  
 So I am trying to understand why a lock timeout or deadlock detection might do so.

This problem has only cropped up recently when they started performing multiple requests that
I know have a deadlock path through them.  I can fix that problem later but this is a system
level problem that I need to resolve.

I really do appreciate the help and guidance and am willing to try to work though this.  
I have to figure this out and either patch Glassfish or Derby in any case as my customer (think
very very large wireless carrier) is getting pretty PO'ed.
The one thing I think of specifically with a deadlock is that it will automatically rollback
the victim transaction and that might throw off this client logic regarding the state of the
server.    But I would think if there were just a simple problem with deadlocks it would have
showed up before now. That said I don't see any specific tests in our XA tests: org.apache.derbyTesting.functionTests.tests.jdbapi.XATest
or org.apache.derbyTesting.functionTests.tests.jdbcapi.XATransactionTest  that test XAConnections
with deadlocks.

  Is this a local transaction on an XA connection or a real XA transaction with two phase

You might want to try to test and an XAConnection with  a simple deadlock case locally to
see if that pops a reproduction.   org.apache.derbyTesting.functionTests.tests.lang.DeadlockDetectionTest
and org.apache.derbyTesting.functionTests.tests.lang have  some examples of deadlocks.



View raw message