db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bergquist, Brett" <BBergqu...@canoga.com>
Subject RE: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout occurs
Date Fri, 23 Dec 2011 22:49:48 GMT
If I disable the istat daemon, I still the the timeout but now the error is reported as it
should and the transaction is aborted and my code continues.  With the itstat deamon enabled,
then the lock timeout is discovered and signaled by exception being thrown but between the
time it gets caught and processed, the istat daemon has set the "isValid" property of the
statement to "false".  The original lock timeout exception is caught but then the 

    Activation.checkStatementValidity()

is called in the exception handler.  This finds that the statement is not valid (isValid ==
false) and throws its own exception indicating that a recompilation is required, swallowing
the original lock timeout exception in the process.  This causes the existing statement to
be rebuilt and re-executed but the locks are still present, the lock timeout will occur again,
the lock timeout exception will be thrown, the istat daemon will set the "isValid" to false
again, etc.  The process just continues forever.

The patch that I attached to the bug only calls

    Activation.checkStatementValidity()

in the exception handler if the caught exception is not set to REPORT_ALWAYS.  This seem okay
and to make sense because if the code is handling an exception that is to be REPORT_ALWAYS,
there is no sense looking for a statement that is invalid; just report the exception.

Brett

________________________________________
From: Kristian Waagan [kristian.waagan@oracle.com]
Sent: Friday, December 23, 2011 5:08 PM
To: derby-dev@db.apache.org
Subject: Re: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource
and a deadlock or lock timeout occurs

On 23.12.2011 17:49, Bergquist, Brett wrote:
> It is the IndexStatisticsDaemonImpl that is invalidating the statement.  To verify this,
I got my test case where it was in this loop.  With my printouts I was seeing that the isValid
was being set by the thread that was stuck right after the timeout occurred and then I immediately
saw makeInvalid being called with action = 40 (UPDATE_STATISTICS).  So I attached the debugger
to the database engine and set a breakpoint in IndexStatisticsDaemonImpl  and then set  "daemonDisabled"
to be "false".   Immediately the lock timeout got reported and the loop terminated and my
test case continued on the way it should.

I haven't followed this closely, just a few quick questions/observations
to check if I'm understanding this correctly:
  o Does disabling the istat daemon suffice as a workaround?
  o With the istat deamon disabled, are you still seeing timeouts?


Thanks,
--
Kristian

>
> So the IndexStatisticsDaemonImpl has the ability to invalidate statements in use by other
threads.  In this case, it is being invoked right after lock timeout occurs and before the
lock timeout exception is being processed.  The code sees that the statement is invalid and
swallows the lock exception and sets up to recompile and run the statement again.
>
> So the fix that I proposed and supplied a patch will handle the code not swallowing the
exception, but I am now concerned if the IndexStatisticsDaemonImpl might have another problem
if it is reaching into other threads :(
>
> -----Original Message-----
> From: Bergquist, Brett [mailto:BBergquist@canoga.com]
> Sent: Friday, December 23, 2011 10:54 AM
> To: derby-dev@db.apache.org
> Subject: RE: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource
and a deadlock or lock timeout occurs
>
> Well, I see that at the start of the execution, even on the next retry, the isValid is
true for the statement.  Something comes along and changes the state of isValid.  I am having
trouble finding out what with the debugger because there are quite a few threads running (statistics
daemon for example) and the makeInvalid method is being invoked from various places and threads.
  So catching the correct one is being difficult to I am going to drop back to the old "printf"
method and try to track the thread that sets the isValid to true and the thread that calls
makeInvalid and try to match them up.
>
> The fix definitely fixed my problem.  There is something in my code that is triggering
the deadlock and it is that statement that you mention below that is the one that is causing
problems.  But I want to understand why the fix is needed and how it gets into this state
in the first place.  The I will look at my code to figure out why this statement causes a
problem.
>
> -----Original Message-----
> From: Kathey Marsden (Commented) (JIRA) [mailto:jira@apache.org]
> Sent: Thursday, December 22, 2011 8:20 PM
> To: derby-dev@db.apache.org
> Subject: [jira] [Commented] (DERBY-5552) Derby threads hanging when using ClientXADataSource
and a deadlock or lock timeout occurs
>
>
>      [ https://issues.apache.org/jira/browse/DERBY-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175216#comment-13175216
]
>
> Kathey Marsden commented on DERBY-5552:
> ---------------------------------------
>
> Yes, definitely you want your test to be the new Junit style so that is the place to
look.
>
> I think I have solved the mystery of the missing XA client  traces and filed DERBY-5553
for that issue.
>
> I am trying to think of a scenario where we we execute a valid statement would get a
lock timeout and then at the time of processing the exception the statement would have become
invalid.  One reason for a statement becoming invalid would be for instance if a table had
been altered, for example if we had a prepared statement "SELECT * FROM TAB" and then a column
was added or dropped.  In that scenario the statement would be recompiled long before the
lock timeout occurred.  If the problem statement is the one with the lock timeout it would
seem to be:
>
>   SELECT DISTINCT t3.ID, t3.OPLOCK FROM PKG_9145E10G.PLPM_PAIR t3, PKG_9145E10G.PLPM_CHASSIS_9145E10G_USER
t2, CORE_V1.DEVICE_ENTITY_USER t1, CORE_V1.DEVICE_ENTITY t0 WHERE ((t0.ID = CAST (? AS INTEGER
)) AND (((t2.PLPM_PAIR_ID = t3.ID) AND ((t2.ID = t1.ID) AND (t1.DTYPE = 'PlpmChassis9145E10GUser')))
AND (t0.ID = t1.DEVICE_ENTITY_ID))) with 1 parameters begin parameter #1: 249360 :end parameter
> ERROR 40XL2: A lock could not be obtained within the time requested.  The lockTable dump
is:
> Thu Dec 22 07:03:07 PST 2011
>
>
> I am not sure how that could have gone invalid between the initial execution and the
lock time out.
>
>
>
>> Derby threads hanging when using ClientXADataSource and a deadlock or lock timeout
occurs
>> -----------------------------------------------------------------------------------------
>>
>>                  Key: DERBY-5552
>>                  URL: https://issues.apache.org/jira/browse/DERBY-5552
>>              Project: Derby
>>           Issue Type: Bug
>>           Components: Network Server
>>     Affects Versions: 10.8.1.2
>>          Environment: Solaris 10, Glassfish V2.1.1,
>>             Reporter: Brett Bergquist
>>             Priority: Blocker
>>          Attachments: appserverstack.txt, client.tar.Z, derby.log, derbystackatshutdown.txt,
execute.patch, transactionsleft.txt
>>
>>
>> The issue arrives when multiple XA transactions are done in parallel and there is
either a lock timeout or a lock deadlock detected.  When this happens the connection is leaked
in the Glassfish connection pool and the client thread hangs in "org.apache.derby.client.netReply.fill(Reply.java:172)".
>> Shutting down the app server fails because the thread has a lock in "org.apache.derby.client.net.NetConnection40"
and another task is calling "org.apache.derby.client.ClientPooledConnection.close(ClientPooledConnection.java:214)"
which is waiting for the lock.
>> Killing the appsever using "kill" and then attempting to shutdown Derby network server
causes the Network Server to hang.  One of the threads hangs waiting for a lock at "org.apache.derby.impl.drda.NeworkServerControlImpl.removeFromSessionTable(NetworkServerControlImpl.java:1525)"
and the "main" thread has this locked at "org.apache.derby.impl.drda.NetworkServerControlImpl.executeWork(NetworkServerControlImpl.java:2242)"
and it itself is waiting for a lock which belongs to a thread that is stuck at "org.apache.derby.impl.services.locks.ActiveLock.waitForGrant(ActiveLock.java:118)
which is in the TIMED_WAITING state.
>> Only by killing the Network Server using "kill" is possible at this point.
>> There are transactions left even though all clients have been removed.
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>




Mime
View raw message