db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bergquist, Brett" <BBergqu...@canoga.com>
Subject RE: Deadlock problem with Derby 10.8.2.2 in production
Date Mon, 14 May 2012 21:03:06 GMT
Not normal as it stayed this way for 2.5 days until the customer called me.  I took stack traces
of both the Derby Network Server and the Glassfish application server every 6 seconds for
one minute.  All show the exact same state.  

I do believe the Derby Network Server is okay at the point of the stack traces as customer
was able to shutdown the Derby Network Server and restart it to clear the problem, but until
this thread in the Glassfish Server which is a Derby client call, returns the system is locked:

"PMCollectionActivationManager.newPmTestResult" daemon prio=3 tid=0x000000010253e800 nid=0x243
runnable [0xfffffffd264fc000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:129)
	at org.apache.derby.client.net.Reply.fill(Unknown Source)
	at org.apache.derby.client.net.Reply.ensureALayerDataInBuffer(Unknown Source)
	at org.apache.derby.client.net.Reply.readDssHeader(Unknown Source)
	at org.apache.derby.client.net.Reply.startSameIdChainParse(Unknown Source)
	at org.apache.derby.client.net.NetXAConnectionReply.readLocalXACommit(Unknown Source)
	at org.apache.derby.client.net.NetXAConnection.readLocalXACommit_(Unknown Source)
	at org.apache.derby.client.net.NetXAConnection.readCommit(Unknown Source)
	at org.apache.derby.client.net.NetConnection.readXACommit_(Unknown Source)
	at org.apache.derby.client.am.Connection.readCommit(Unknown Source)
	at org.apache.derby.client.am.Connection.readAutoCommit(Unknown Source)
	at org.apache.derby.client.am.Connection.flowClose(Unknown Source)
	at org.apache.derby.client.am.Connection.closeResourcesX(Unknown Source)
	at org.apache.derby.client.am.Connection.closeResources(Unknown Source)
	- locked <0xfffffffdad887980> (a org.apache.derby.client.net.NetConnection40)
	at org.apache.derby.client.net.NetConnection.closeResources(Unknown Source)
	- locked <0xfffffffdad887980> (a org.apache.derby.client.net.NetConnection40)
	at org.apache.derby.client.ClientPooledConnection.close(Unknown Source)
	- locked <0xfffffffdad887860> (a org.apache.derby.client.ClientXAConnection40)
	at org.apache.derby.client.ClientXAConnection.close(Unknown Source)
	- locked <0xfffffffdad887860> (a org.apache.derby.client.ClientXAConnection40)
	at com.sun.gjc.spi.ManagedConnection.destroy(ManagedConnection.java:276)

>From looking at the Derby Network Server stack trace, I think there should be some indication
that Derby Network Server is doing something related to this but I see nothing in the threads
of the Derby Network Server.   It is as if the Derby Network Server has lost this XA transaction
and the response is never going to be returned to this client call.

So I guess the question is, with this state information, how can a client be waiting on the
server forever but the server has no indication that it is serving this client and will never
respond.   It seems to me if the socket connection were closed, this "read" would return with
an error, but this has not happened and after 2.5 days I don't think the server side is going
to respond.

Note the errors that are in the derby.log at this time:

	Thu May 10 08:10:52 EDT 2012 Thread[DRDAConnThread_23,5,main] (DATABASE = csemdb), (DRDAID
= ????????.??-736900152376645348{85251}), 	Execution failed because of a Distributed Protocol
Error:  DRDA_Proto_SYNTAXRM; CODPNT arg  = 0; Error Code Value = 3. Plaintext 	connection
attempt from an SSL enabled client?
	Thu May 10 08:10:52 EDT 2012 : Execution failed because of a Distributed Protocol Error:
 DRDA_Proto_SYNTAXRM; CODPNT arg  = 0; Error 	Code Value = 3. Plaintext connection attempt
from an SSL enabled client?
	Thu May 10 08:10:52 EDT 2012 : Execution failed because of a Distributed Protocol Error:
 DRDA_Proto_SYNTAXRM; CODPNT arg  = 0; Error 	Code Value = 3. Plaintext connection attempt
from an SSL enabled client?
	org.apache.derby.impl.drda.DRDAProtocolException: Execution failed because of a Distributed
Protocol Error:  DRDA_Proto_SYNTAXRM; CODPNT 	arg  = 0; Error Code Value = 3. Plaintext connection
attempt from an SSL enabled client?

There are no SSL enabled clients in use.  So it appears like there something in the protocol
broke and the client thread is waiting for something that is never going to be received.

Also suspicious is that 5 seconds before in the derby.log is:

There was an XA transaction associated with the connection being closed. The transaction is
going to be rolled back. The transaction Xid is (4871251,f19f0201d1f69e27636776776e6a73767230312c7365727665722c5033373030,636776776e6a73767230312c7365727665722c50333730302c00).
Exception in thread "DRDAConnThread_61" java.lang.NullPointerException
        at org.apache.derby.impl.jdbc.EmbedConnection.cancelRunningStatement(Unknown Source)
        at org.apache.derby.jdbc.XATransactionState.cancel(Unknown Source)
        at org.apache.derby.jdbc.ResourceAdapterImpl.cancelXATransaction(Unknown Source)
        at org.apache.derby.impl.drda.DRDAXAProtocol.rollbackCurrentTransaction(Unknown Source)
        at org.apache.derby.impl.drda.DRDAConnThread.closeSession(Unknown Source)
        at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)

Which indicates a NPE during the cleanup of an XA transaction.  So I suppose I should look
at how the NPE can happen and what effect that might have as far as responding to a client.
 Also I suppose I should look at what happens with the protocol exception and what effect
is might have on responding to a client.  Neither of the two threads referenced "DRDAConnThread_23"
or " DRDAConnThread_61" are present in the Derby Network Server stack trace either.

I do appreciate the you taking a look at this Dag.

-----Original Message-----
From: Dag H. Wanvik [mailto:dag.wanvik@oracle.com] 
Sent: Monday, May 14, 2012 3:12 PM
To: derby-dev@db.apache.org
Subject: Re: Deadlock problem with Derby 10.8.2.2 in production

"Bergquist, Brett" <BBergquist@canoga.com> writes:

> In our production environment, a deadlock occurred.   The problem was triggered by some
problem in derby, but unfortunately, the error message returned to the client is:
>
> [#|2012-05-10T08:10:47.761-0400|WARNING|sun-appserver2.1|org.eclipse.p
> ersistence.session.file:/opt/canoga/canogaview/glassfish/domains/domai
> n1/applications/j2ee-apps/csem/csem-core-api.jar-csem-PU|_ThreadID=66;
> _ThreadName=PMCollectionActivationManager.samResetDetected;_RequestID=
> 9e8852a5-9136-4096-94b1-b3ccbf95d810;|
> Local Exception Stack:
> Exception [EclipseLink-4002] (Eclipse Persistence Services - 
> 1.1.3.v20091002-r5404): 
> org.eclipse.persistence.exceptions.DatabaseException
> Internal Exception: java.sql.SQLException: DERBY SQL error: SQLCODE: 
> -1, SQLSTATE: XJ001, SQLERRMC: java.lang.NullPointerException  XJ001.U

Do you have access to derby.log at the time this happens?  

Of the server sessions threads DRDAConnThreads (597): 555 are cached on waiting on the server
side waiting to start servering new sessions, 42 seem to be involved in current connections
(blocked on socket read). Looks normal..

Dag



Mime
View raw message