db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brett Bergquist (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-6955) All database threads stuck and not making any progress
Date Thu, 10 Aug 2017 15:26:01 GMT

    [ https://issues.apache.org/jira/browse/DERBY-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121792#comment-16121792

Brett Bergquist commented on DERBY-6955:

A quick look at the LogToFile.flush() method, it looks like there is two possible "wait" conditions.
 The first is waiting if the database is "frozen" and the second is waiting if the log is
being flushed.  There is comments that there may be multiple threads trying to flush the log,
so that would explain a bit why there are multiple threads that have acquired the lock: they
have acquired the lock and are waiting.

I do not see any thread that indicates that the database is frozen.   However, out backup
process does freeze the database, take a filesystem snapshot, and unfreeze the database. 
 It is possible that this occurred at this point.

I had questioned a while back that if a connection is lost while the database is frozen, the
database should be unfrozen automatically.   I have seen the database become frozen and no
connections can be made to unfreeze the database.

> All database threads stuck and not making any progress
> ------------------------------------------------------
>                 Key: DERBY-6955
>                 URL: https://issues.apache.org/jira/browse/DERBY-6955
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions:
>         Environment: Solaris 10
> Very large database (about 750G) with about 100 tx/second insertion rate
> Network Server 
>            Reporter: Brett Bergquist
>         Attachments: stacktrace1, stacktrace2
> A system in production got into a very strange state with none of the database engine
threads making any progress.   Initially a connection could be made to the network server,
but any query would hang.  The network server also hung on shutdown.
> The lack of progresss was observed by generating 6 stack traces, 10 seconds apart.  Each
of these stack traces show the threads exactly as in the previous stack trace.  I will attach
2 of the stack traces which are 10 seconds apart.
> About half of the threads were in a call chain the looks like:
> {code:java}
> "DRDAConnThread_161" #187 prio=5 os_prio=64 tid=0x0000000101f1c800 nid=0xd6 waiting on
condition [0xfffffffcad0fe000]
>    java.lang.Thread.State: WAITING (parking)
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0xfffffffefa4524c8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1976)
> 	at org.apache.derby.impl.services.cache.CacheEntry.waitUntilIdentityIsSet(Unknown Source)
> 	at org.apache.derby.impl.services.cache.ConcurrentCache.getEntry(Unknown Source)
> 	at org.apache.derby.impl.services.cache.ConcurrentCache.find(Unknown Source)
> 	at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(Unknown Source)
> 	at org.apache.derby.impl.store.access.RAMTransaction.findConglomerate(Unknown Source)
> 	at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(Unknown
> 	at org.apache.derby.impl.store.access.RAMTransaction.openStoreCost(Unknown Source)
> 	at org.apache.derby.impl.sql.compile.CompilerContextImpl.getStoreCostController(Unknown
> 	at org.apache.derby.impl.sql.compile.FromBaseTable.getStoreCostController(Unknown Source)
> 	at org.apache.derby.impl.sql.compile.FromBaseTable.estimateCost(Unknown Source)
> 	at org.apache.derby.impl.sql.compile.OptimizerImpl.estimateTotalCost(Unknown Source)
> 	at org.apache.derby.impl.sql.compile.OptimizerImpl.costBasedCostOptimizable(Unknown
> 	at org.apache.derby.impl.sql.compile.OptimizerImpl.costOptimizable(Unknown Source)
> 	at org.apache.derby.impl.sql.compile.FromBaseTable.optimizeIt(Unknown Source)
> 	at org.apache.derby.impl.sql.compile.ProjectRestrictNode.optimizeIt(Unknown Source)
> 	at org.apache.derby.impl.sql.compile.OptimizerImpl.costPermutation(Unknown Source)
> 	at org.apache.derby.impl.sql.compile.SelectNode.optimize(Unknown Source)
> 	at org.apache.derby.impl.sql.compile.DMLStatementNode.optimizeStatement(Unknown Source)
> 	at org.apache.derby.impl.sql.compile.CursorNode.optimizeStatement(Unknown Source)
> 	at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
> 	at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
> 	at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
> 	at org.apache.derby.impl.jdbc.EmbedPreparedStatement.<init>(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedPreparedStatement20.<init>(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedPreparedStatement30.<init>(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedPreparedStatement40.<init>(Unknown Source)
> 	at org.apache.derby.jdbc.Driver40.newEmbedPreparedStatement(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(Unknown Source)
> 	- locked <0xffffffff12e07950> (a org.apache.derby.impl.jdbc.EmbedConnection40)
> 	at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(Unknown Source)
> 	at org.apache.derby.impl.drda.DRDAStatement.prepare(Unknown Source)
> 	at org.apache.derby.impl.drda.DRDAStatement.explicitPrepare(Unknown Source)
> 	at org.apache.derby.impl.drda.DRDAConnThread.parsePRPSQLSTT(Unknown Source)
> 	at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown Source)
> 	at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
> {code}
> and the majority of the others look like:
> {code:java}
> "DRDAConnThread_118" #144 prio=5 os_prio=64 tid=0x0000000101072800 nid=0xab in Object.wait()
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	at java.lang.Object.wait(Object.java:502)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
> 	- locked <0xfffffffd605393c0> (a org.apache.derby.impl.store.raw.log.LogToFile)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
> 	at org.apache.derby.impl.store.raw.log.FileLogger.flush(Unknown Source)
> 	at org.apache.derby.impl.store.raw.xact.Xact.prepareCommit(Unknown Source)
> 	at org.apache.derby.impl.store.raw.xact.Xact.xa_commit(Unknown Source)
> 	at org.apache.derby.impl.store.access.RAMTransaction.xa_commit(Unknown Source)
> 	at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.doCommit(Unknown
> 	at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.xaCommit(Unknown
> 	at org.apache.derby.impl.jdbc.EmbedConnection.xa_commit(Unknown Source)
> 	- locked <0xfffffffef72c71d8> (a org.apache.derby.impl.jdbc.EmbedConnection40)
> 	at org.apache.derby.jdbc.XATransactionState.xa_commit(Unknown Source)
> 	- eliminated <0xfffffffefa7b2958> (a org.apache.derby.jdbc.XATransactionState)
> 	at org.apache.derby.jdbc.EmbedXAResource.commit(Unknown Source)
> 	- locked <0xfffffffefa7b2958> (a org.apache.derby.jdbc.XATransactionState)
> 	- locked <0xffffffff5343b4c0> (a org.apache.derby.jdbc.EmbedXAResource)
> 	at org.apache.derby.impl.drda.DRDAXAProtocol.commitXATransaction(Unknown Source)
> 	at org.apache.derby.impl.drda.DRDAXAProtocol.commitTransaction(Unknown Source)
> 	at org.apache.derby.impl.drda.DRDAXAProtocol.parseSYNCCTL(Unknown Source)
> 	at org.apache.derby.impl.drda.DRDAConnThread.processCommands(Unknown Source)
> 	at org.apache.derby.impl.drda.DRDAConnThread.run(Unknown Source)
> {code}
> An interesting thing with the second one is the lock on
> - locked <0xfffffffd605393c0> (a org.apache.derby.impl.store.raw.log.LogToFile)
> Multiple threads claim to have locked this object which I find concerning as one would
assume that only one thread could have a lock at a time.   See the attached stack trace and
you will see that there are many threads that seem to have locked this object and are in the
same state.

This message was sent by Atlassian JIRA

View raw message