db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dag H. Wanvik (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (DERBY-4741) Make Derby work reliably in the presence of thread interrupts
Date Wed, 20 Oct 2010 23:52:24 GMT

    [ https://issues.apache.org/jira/browse/DERBY-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923238#action_12923238
] 

Dag H. Wanvik edited comment on DERBY-4741 at 10/20/10 7:50 PM:
----------------------------------------------------------------

Note to self: I have found a problem with the interrupt recovery of
RAFContainer4. Its call to openContainer needs to be protected by the
monitor on FileContainer#allocCache, because opening a container
evetually leads to a call on AllocationCache#reset. AllocCache javadoc
states that the the callers need to synchronize themselves since it is
itself not MT safe. 

Threads inside RAFContainer4#{readPage, writePage} do not necessarily
own this monitor when recovery is attempted.

I did once see a race condition due to this.  In the race condition, a
thread was trying to write to a new page and got a array out of bounds
exception inside AllocationCache.validate (numExtents was suddenly back to 0)
because another thread was doing interrupt recovery by calling
RAFContainer4#recoverContainerAfterInterrupt ->
openContainer -> ... -> AllocationCache#reset (unprotected).

[edit add]:
Simply enveloping recoverContainerAfterInterrupt's call to openContainer in synchronized(allocCache)
won't work: can lead to deadlock.

      was (Author: dagw):
    Note to self: I have found a problem with the interrupt recovery of
RAFContainer4. Its call to openContainer needs to be protected by the
monitor on FileContainer#allocCache, because opening a container
evetually leads to a call on AllocationCache#reset. AllocCache javadoc
states that the the callers need to synchronize themselves since it is
itself not MT safe.

Threads inside RAFContainer4#{readPage, writePage} do not necessarily
own this monitor when recovery is attempted.

I did once see a race condition due to this.  In the race condition, a
thread was trying to write to a new page and got a array out of bounds
exception inside AllocationCache.validate (numExtents was suddenly back to 0)
because another thread was doing interrupt recovery by calling
RAFContainer4#recoverContainerAfterInterrupt ->
openContainer -> ... -> AllocationCache#reset (unprotected).

  
> Make Derby work reliably in the presence of thread interrupts
> -------------------------------------------------------------
>
>                 Key: DERBY-4741
>                 URL: https://issues.apache.org/jira/browse/DERBY-4741
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.1.6, 10.2.2.0, 10.3.1.4, 10.3.2.1, 10.3.3.0, 10.4.1.3, 10.4.2.0,
10.5.1.1, 10.5.2.0, 10.5.3.0, 10.6.1.0
>            Reporter: Dag H. Wanvik
>            Assignee: Dag H. Wanvik
>         Attachments: derby-4741-all+lenient+resurrect.diff, derby-4741-all+lenient+resurrect.stat,
derby-4741-nio-container+log+waits+locks+throws.diff, derby-4741-nio-container+log+waits+locks+throws.stat,
derby-4741-nio-container+log+waits+locks-2.diff, derby-4741-nio-container+log+waits+locks-2.stat,
derby-4741-nio-container+log+waits+locks.diff, derby-4741-nio-container+log+waits+locks.stat,
derby-4741-nio-container+log+waits.diff, derby-4741-nio-container+log+waits.stat, derby-4741-nio-container+log.diff,
derby-4741-nio-container+log.stat, derby-4741-nio-container-2.diff, derby-4741-nio-container-2.log,
derby-4741-nio-container-2.stat, derby-4741-nio-container-2b.diff, derby-4741-nio-container-2b.stat,
derby.log, derby.log, MicroAPITest.java, xsbt0.log.gz
>
>
> When not executing on a small device VM, Derby has been using the Java NIO classes java.nio.clannel.*
for file io.
> If thread is interrupted while executing blocking IO operations in NIO, the ClosedByInterruptException
will get thrown. Unfortunately, Derby isn't current architected to retry and complete such
operations (before passing on the interrupt), so the Derby database can be left in an inconsistent
state and we therefore have to return a database level error. This means the applications
can no longer access the database without a shutdown and reboot including a recovery.
> It would be nice if Derby could somehow detect and finish IO operations underway when
thread interrupts happen before passing the exception on to the application. Derby embedded
is sometimes embedded in applications that use Thread.interrupt to stop threads.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message