db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dag H. Wanvik (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5185) store/rollForwardRecovery.sql stuck in RAFContainer4.recoverContainerAfterInterrupt() during shutdown
Date Mon, 11 Apr 2011 15:10:05 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018382#comment-13018382
] 

Dag H. Wanvik commented on DERBY-5185:
--------------------------------------

The hang occurs in this piece of looping code (cf the sleep)

       // Wait till other concurrent threads hit the wall
        // (ClosedChannelException) and are a ready wait for us to clean up, so
        // we can set them loose when we're done.
        while (true) {
            synchronized (channelCleanupMonitor) {
                if (threadsInPageIO == 0) {
                    // Either no concurrent threads, or they are now waiting
                    // for us to clean up (see ClosedChannelException case)
                    break;
                }
            }

            try {
                Thread.sleep(10);
            } catch (InterruptedException te) {
                InterruptStatus.setInterrupted();
            }
        }

Looking at VM running, we can see that there are only two Derby threads still live, and only
the one that loops is in the NIO code.
This means that the counter will never reach zero. Again looking at the live VM's heap, there
are several containers whose counters are positive, which they are not supposed to be, since
we have only this one thread inside the NIO code. The container  for whom the CacheEntry#keepCount
> 0 does have a value threadsInPageIO > 0. This is probably the one that sees the hang
during HeapPostCommit#purgeComittedDeletes.

This is not good, since this happens during shutdown without any interrupts being issued by
the application code (Derby uses interrupts during shutdown to stop threads). So, this is
a regression in that earlier, this would only result in an IO error during shutdown (masked),
wheres as now we see a hang. 

I will a) commit a patch to make the loop bounded so we can get the regression out of the
way, then b) look for the reason why the counter gets wrong.

> store/rollForwardRecovery.sql stuck in RAFContainer4.recoverContainerAfterInterrupt()
during shutdown
> -----------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-5185
>                 URL: https://issues.apache.org/jira/browse/DERBY-5185
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.9.0.0
>         Environment: Derby 10.9.0.0 alpha - (1090406)
> Oracle Enterprise Linux 6.0
> Linux 2.6.32-100.28.9.el6.x86_64 #1 SMP Wed Mar 16 19:24:16 EDT 2011 x86_64 x86_64 x86_64
GNU/Linux
> java version "1.7.0-ea"
> Java(TM) SE Runtime Environment (build 1.7.0-ea-b135)
> Java HotSpot(TM) Client VM (build 21.0-b05, mixed mode, sharing)
>            Reporter: Knut Anders Hatlen
>         Attachments: stack.txt
>
>
> I have a derbyall that has been running for more than two days now. It seems to be stuck
in the store/rollForwardRecovery.sql test while the engine is shutting down.
> Here's the stack trace for the daemon thread that's stuck:
> "derby.rawStoreDaemon" daemon prio=10 tid=0xf3e7dc00 nid=0x3505 waiting on condition
[0xf4066000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.derby.impl.store.raw.data.RAFContainer4.recoverContainerAfterInterrupt(Unknown
Source)
>         at org.apache.derby.impl.store.raw.data.RAFContainer4.readPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.RAFContainer4.readPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.CachedPage.readPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.CachedPage.setIdentity(Unknown Source)
>         at org.apache.derby.impl.services.cache.ConcurrentCache.find(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.FileContainer.getAllocPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.BaseContainer.getAllocPage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.BaseContainerHandle.getAllocPage(Unknown
Source)
>         at org.apache.derby.impl.store.raw.data.FileContainer.deallocatePagenum(Unknown
Source)
>         - locked <0xc5adbce8> (a org.apache.derby.impl.store.raw.data.AllocationCache)
>         at org.apache.derby.impl.store.raw.data.FileContainer.deallocatePage(Unknown
Source)
>         at org.apache.derby.impl.store.raw.data.BaseContainer.removePage(Unknown Source)
>         at org.apache.derby.impl.store.raw.data.BaseContainerHandle.removePage(Unknown
Source)
>         at org.apache.derby.impl.store.access.heap.HeapController.removePage(Unknown
Source)
>         at org.apache.derby.impl.store.access.heap.HeapPostCommit.purgeCommittedDeletes(Unknown
Source)
>         at org.apache.derby.impl.store.access.heap.HeapPostCommit.performWork(Unknown
Source)
>         at org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(Unknown Source)
>         at org.apache.derby.impl.services.daemon.BasicDaemon.work(Unknown Source)
>         at org.apache.derby.impl.services.daemon.BasicDaemon.run(Unknown Source)
>         at java.lang.Thread.run(Thread.java:722)
> And here's the stack trace for the main thread, which is waiting for the daemon thread
to stop:
> "main" prio=10 tid=0xf6c05c00 nid=0x34e5 in Object.wait() [0xf6dbe000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xc5ac5760> (a org.apache.derby.impl.services.daemon.BasicDaemon)
>         at java.lang.Object.wait(Object.java:504)
>         at org.apache.derby.impl.services.daemon.BasicDaemon.pause(Unknown Source)
>         - locked <0xc5ac5760> (a org.apache.derby.impl.services.daemon.BasicDaemon)
>         at org.apache.derby.impl.services.daemon.BasicDaemon.stop(Unknown Source)
>         at org.apache.derby.impl.store.raw.RawStore.stop(Unknown Source)
>         at org.apache.derby.impl.services.monitor.TopService.stop(Unknown Source)
>         at org.apache.derby.impl.services.monitor.TopService.shutdown(Unknown Source)
>         at org.apache.derby.impl.services.monitor.BaseMonitor.shutdown(Unknown Source)
>         at org.apache.derby.impl.services.monitor.BaseMonitor.shutdown(Unknown Source)
>         at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
>         at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source)
>         at java.sql.DriverManager.getConnection(DriverManager.java:620)
>         at java.sql.DriverManager.getConnection(DriverManager.java:222)
>         at org.apache.derby.impl.tools.ij.utilMain.cleanupGo(Unknown Source)
>         at org.apache.derby.impl.tools.ij.utilMain.go(Unknown Source)
>         at org.apache.derby.impl.tools.ij.Main.go(Unknown Source)
>         at org.apache.derby.impl.tools.ij.Main.mainCore(Unknown Source)
>         at org.apache.derby.impl.tools.ij.Main.main(Unknown Source)
>         at org.apache.derby.tools.ij.main(Unknown Source)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message