db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dag H. Wanvik (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DERBY-5325) Checkpoint fails with ClosedChannelException in InterruptResilienceTest
Date Thu, 14 Jul 2011 22:16:00 GMT

     [ https://issues.apache.org/jira/browse/DERBY-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dag H. Wanvik updated DERBY-5325:
---------------------------------

    Attachment: derby-5325a.stat
                derby-5325a.diff

Uploading a patch for this issue, derby-5325a.

With NIO, writeRAFHeader has two methods leading to interruptible IO:
 - getEmbryonicPage
 - writeHeader
 
Currently, getEmbryonicPage may throw InterruptDetectedException and hence, so may writeRAFHeader.

writeHeader may throw ClosedByInterruptException, AsynchronousCloseException and ClosedChannelException
because writeHeader does not use RAFContainer4#writePage, but rather uses RAFContainer4#writeAtOffset,
which does not currently attempt to recover after interrupt.

So currently, clients of writeRAFHeader need to be prepared for all of InterruptDetectedException,
ClosedByInterruptException, AsynchronousCloseException and ClosedChannelException.

writeRAFHeader is used in three locations:

 - RAFContainer#clean
 - RAFContainer#run(CREATE_CONTAINER_ACTION)
 - RAFContainer#run(STUBBIFY_ACTION)

RAFContainer#clean is prepared for InterruptDetectedException only. The issue shows that ClosedChannelException
may also occur, and it is not prepared for that (this bug).

RAFContainer#run(CREATE_CONTAINER_ACTION) is prepared for ClosedByInterruptException and AsynchronousCloseException.
Since IO during container creation is single-threaded, this is sufficient: it should never
need to handle ClosedChannelException/InterruptDetectedException, both of which signal that
another thread saw interrupt on the container channel.

RAFContainer#run(STUBBIFY_ACTION) is part of the removeContainer operation which should happen
after the container is closed, so it should be single-threaded on the container as well(?).
It should handle ClosedByInterruptException and AsynchronousCloseException and do retry, but
doesn't, currently.

If we let writeAtOffset clean up just like writePage, RAFContainer4#writeAtOffset (i.e.also
writeHeader) would only only throw InterruptDetectedException, i.e. another thread saw interrupt,
so retry. This would simplify logic in RAFContainer: we could remove the retry logic from
RAFContainer#run(CREATE_CONTAINER_ACTION). This could also cover retry logic for RAFContainer#run(STUBBIFY_ACTION)
wrt its use of writeRAFHeader.

Next, RAFContainer#clean is already handling InterruptDetectedException and would with this
change no longer see ClosedByInterruptException, AsynchronousCloseException or ClosedChannelException.
This should solve DERBY-5325.

I did not add a new test for this issue yet since I don't know how to force this scenario.
We have only seen it once, I believe. I'll be running InterruptResilienceTest continuously
with this patch along with the patch for DERBY-5312 on several platforms to gain more confidence.


> Checkpoint fails with ClosedChannelException in InterruptResilienceTest
> -----------------------------------------------------------------------
>
>                 Key: DERBY-5325
>                 URL: https://issues.apache.org/jira/browse/DERBY-5325
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.9.0.0
>         Environment: Solaris 10 5/08 s10x_u5wos_10 X86
> Java(TM) SE Runtime Environment (build 1.7.0-b147)
> Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17 mixed mode)
>            Reporter: Knut Anders Hatlen
>            Assignee: Dag H. Wanvik
>         Attachments: derby-5325a.diff, derby-5325a.stat, derby.log, error-stacktrace.out
>
>
> Seen here: http://dbtg.foundry.sun.com/derby/test/Daily/jvm1.7/testing/testlog/sol/1144688-suitesAll_diff.txt
> There was 1 error:
> 1) testRAFWriteInterrupted(org.apache.derbyTesting.functionTests.tests.store.InterruptResilienceTest)java.sql.SQLException:
The exception 'java.sql.SQLException: Log Record has been sent to the stream, but it cannot
be applied to the store (Object null).  This may cause recovery problems also.' was thrown
while evaluating an expression.
> 	at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
> 	at org.apache.derby.impl.jdbc.EmbedStatement.executeUpdate(Unknown Source)
> 	at org.apache.derbyTesting.functionTests.tests.store.InterruptResilienceTest.testRAFWriteInterrupted(InterruptResilienceTest.java:217)
> (...)
> Caused by: java.nio.channels.ClosedChannelException
> 	at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:94)
> 	at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:691)
> 	at org.apache.derby.impl.store.raw.data.RAFContainer4.writeFull(Unknown Source)
> 	at org.apache.derby.impl.store.raw.data.RAFContainer4.writeAtOffset(Unknown Source)
> 	at org.apache.derby.impl.store.raw.data.FileContainer.writeHeader(Unknown Source)
> 	at org.apache.derby.impl.store.raw.data.RAFContainer.writeRAFHeader(Unknown Source)
> 	at org.apache.derby.impl.store.raw.data.RAFContainer.clean(Unknown Source)
> 	at org.apache.derby.impl.services.cache.ConcurrentCache.cleanAndUnkeepEntry(Unknown
Source)
> 	at org.apache.derby.impl.services.cache.ConcurrentCache.cleanCache(Unknown Source)
> 	at org.apache.derby.impl.services.cache.ConcurrentCache.cleanAll(Unknown Source)
> 	at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.checkpoint(Unknown Source)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.checkpointWithTran(Unknown Source)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.checkpoint(Unknown Source)
> 	at org.apache.derby.impl.store.raw.RawStore.checkpoint(Unknown Source)
> 	at org.apache.derby.impl.store.raw.log.LogToFile.performWork(Unknown Source)
> 	at org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(Unknown Source)
> 	at org.apache.derby.impl.services.daemon.BasicDaemon.work(Unknown Source)
> 	at org.apache.derby.impl.services.daemon.BasicDaemon.run(Unknown Source)
> 	at java.lang.Thread.run(Thread.java:722)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message