db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dag H. Wanvik (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DERBY-5185) store/rollForwardRecovery.sql stuck in RAFContainer4.recoverContainerAfterInterrupt() during shutdown
Date Fri, 15 Apr 2011 13:51:05 GMT

     [ https://issues.apache.org/jira/browse/DERBY-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dag H. Wanvik updated DERBY-5185:
---------------------------------

    Attachment: derby-5185-2a.stat
                derby-5185-2a.diff

Scenario: A thread has seen an interrupt and is getting ready to resurrect the channel in
"recoverContainerAfterInterrupt". Before it starts doing this, however, it waits for any other
threads currently doing IO on this container to "hit the wall" and start waiting for the current
thread to do the recovery. The current thread knows its free to proceed when the counter "threadsInPageIO"
reaches 0. In this case, it has given up waiting for the counter the reach 0 and throws FILE_IO_INTERRUPTED.
(I believe "threadsInPageIO" should have been 0 here but is not for some reason, see below).
In throwing, it neglects to reset state variable "restoreChannelInProgress" which makes the
(next) thread coming along, which we see hanging in this issue get stuck when trying to enter
the IO code in the "gain entry" section.

Attaching a patch derby-5185-2a, which fixes state invariant maintenance when throwing FILE_IO_INTERRUPTED.
It also adds a maximum number of retries for the readPage code abd fixes some cases whereby
the state variable "threadsInPageIO" could risk not being properly update when exceptions
would get thrown. This may be the underlying reason for what we see here.

Ran regressions OK on Solaris/JDK6 and Debian/JDK7.

> store/rollForwardRecovery.sql stuck in RAFContainer4.recoverContainerAfterInterrupt()
during shutdown
> -----------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-5185
>                 URL: https://issues.apache.org/jira/browse/DERBY-5185
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.9.0.0
>         Environment: Derby 10.9.0.0 alpha - (1090406)
> Oracle Enterprise Linux 6.0
> Linux 2.6.32-100.28.9.el6.x86_64 #1 SMP Wed Mar 16 19:24:16 EDT 2011 x86_64 x86_64 x86_64
GNU/Linux
> java version "1.7.0-ea"
> Java(TM) SE Runtime Environment (build 1.7.0-ea-b135)
> Java HotSpot(TM) Client VM (build 21.0-b05, mixed mode, sharing)
>            Reporter: Knut Anders Hatlen
>         Attachments: derby-5185-1a.diff, derby-5185-2a.diff, derby-5185-2a.stat, stack.txt
>
>
> I have a derbyall that has been running for more than two days now. It seems to be stuck
in the store/rollForwardRecovery.sql test while the engine is shutting down.
> Here's the stack trace for the daemon thread that's stuck:
> "derby.rawStoreDaemon" daemon prio=10 tid=0xf3e7dc00 nid=0x3505 waiting on condition
[0xf4066000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.derby.impl.store.raw.data.RAFContainer4.recoverContainerAfterInterrupt(Unknown
Source)
>         at org.apache.derby.impl.store.raw.data.RAFContainer4.readPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.RAFContainer4.readPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.CachedPage.readPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(Unknown Source)
>         at org.apache.derby.impl.services.cache.ConcurrentCache.find(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.FileContainer.getAllocPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.BaseContainer.getAllocPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.BaseContainerHandle.getAllocPage(Unknown
Source)
>         at org.apache.derby.impl.store.raw.data.FileContainer.deallocatePagenum(Unknown
Source)
>         - locked <0xc5adbce8> (a org.apache.derby.impl.store.raw.data.AllocationCache)
>         at org.apache.derby.impl.store.raw.data.FileContainer.deallocatePage(Unknown
Source)
>         at org.apache.derby.impl.store.raw.data.BaseContainer.removePage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.BaseContainerHandle.removePage(Unknown
Source)
>         at org.apache.derby.impl.store.access.heap.HeapController.removePage(Unknown
Source)
>         at org.apache.derby.impl.store.access.heap.HeapPostCommit.purgeCommittedDeletes(Unknown
Source)
>         at org.apache.derby.impl.store.access.heap.HeapPostCommit.performWork(Unknown
Source)
>         at org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(Unknown Source)
>         at org.apache.derby.impl.services.daemon.BasicDaemon.work(Unknown Source)
>         at org.apache.derby.impl.services.daemon.BasicDaemon.run(Unknown Source)
>         at java.lang.Thread.run(Thread.java:722)
> And here's the stack trace for the main thread, which is waiting for the daemon thread
to stop:
> "main" prio=10 tid=0xf6c05c00 nid=0x34e5 in Object.wait() [0xf6dbe000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xc5ac5760> (a org.apache.derby.impl.services.daemon.BasicDaemon)
>         at java.lang.Object.wait(Object.java:504)
>         at org.apache.derby.impl.services.daemon.BasicDaemon.pause(Unknown Source)
>         - locked <0xc5ac5760> (a org.apache.derby.impl.services.daemon.BasicDaemon)
>         at org.apache.derby.impl.services.daemon.BasicDaemon.stop(Unknown Source)
>         at org.apache.derby.impl.store.raw.RawStore.stop(Unknown Source)
>         at org.apache.derby.impl.services.monitor.TopService.stop(Unknown Source)
>         at org.apache.derby.impl.services.monitor.TopService.shutdown(Unknown Source)
>         at org.apache.derby.impl.services.monitor.BaseMonitor.shutdown(Unknown Source)
>         at org.apache.derby.impl.services.monitor.BaseMonitor.shutdown(Unknown Source)
>         at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
>         at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source)
>         at java.sql.DriverManager.getConnection(DriverManager.java:620)
>         at java.sql.DriverManager.getConnection(DriverManager.java:222)
>         at org.apache.derby.impl.tools.ij.utilMain.cleanupGo(Unknown Source)
>         at org.apache.derby.impl.tools.ij.utilMain.go(Unknown Source)
>         at org.apache.derby.impl.tools.ij.Main.go(Unknown Source)
>         at org.apache.derby.impl.tools.ij.Main.mainCore(Unknown Source)
>         at org.apache.derby.impl.tools.ij.Main.main(Unknown Source)
>         at org.apache.derby.tools.ij.main(Unknown Source)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message