Return-Path: X-Original-To: apmail-db-derby-commits-archive@www.apache.org Delivered-To: apmail-db-derby-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 291526B47 for ; Tue, 19 Jul 2011 14:29:30 +0000 (UTC) Received: (qmail 18250 invoked by uid 500); 19 Jul 2011 14:29:30 -0000 Delivered-To: apmail-db-derby-commits-archive@db.apache.org Received: (qmail 18213 invoked by uid 500); 19 Jul 2011 14:29:29 -0000 Mailing-List: contact derby-commits-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: "Derby Development" List-Id: Delivered-To: mailing list derby-commits@db.apache.org Received: (qmail 18206 invoked by uid 99); 19 Jul 2011 14:29:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jul 2011 14:29:29 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jul 2011 14:29:26 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id 620C42388901; Tue, 19 Jul 2011 14:29:04 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1148354 - in /db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data: RAFContainer.java RAFContainer4.java Date: Tue, 19 Jul 2011 14:29:04 -0000 To: derby-commits@db.apache.org From: dag@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20110719142904.620C42388901@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: dag Date: Tue Jul 19 14:29:03 2011 New Revision: 1148354 URL: http://svn.apache.org/viewvc?rev=1148354&view=rev Log: DERBY-5325 Checkpoint fails with ClosedChannelException in InterruptResilienceTest Patch derby-5325a: With NIO, writeRAFHeader has two methods leading to interruptible IO: - getEmbryonicPage - writeHeader Currently, getEmbryonicPage may throw InterruptDetectedException and hence, so may writeRAFHeader. writeHeader may throw ClosedByInterruptException, AsynchronousCloseException and ClosedChannelException because writeHeader does not use RAFContainer4#writePage, but rather uses RAFContainer4#writeAtOffset, which does not currently attempt to recover after interrupt. So currently, clients of writeRAFHeader need to be prepared for all of InterruptDetectedException, ClosedByInterruptException, AsynchronousCloseException and ClosedChannelException. writeRAFHeader is used in three locations: - RAFContainer#clean - RAFContainer#run(CREATE_CONTAINER_ACTION) - RAFContainer#run(STUBBIFY_ACTION) RAFContainer#clean is prepared for InterruptDetectedException only. The issue shows that ClosedChannelException may also occur, and it is not prepared for that (this bug). RAFContainer#run(CREATE_CONTAINER_ACTION) is prepared for ClosedByInterruptException and AsynchronousCloseException. Since IO during container creation is single-threaded, this is sufficient: it should never need to handle ClosedChannelException/InterruptDetectedException, both of which signal that another thread saw interrupt on the container channel. RAFContainer#run(STUBBIFY_ACTION) is part of the removeContainer operation which should happen after the container is closed, so it should be single-threaded on the container as well(?). It should handle ClosedByInterruptException and AsynchronousCloseException and do retry, but doesn't, currently. If we let writeAtOffset clean up just like writePage, RAFContainer4#writeAtOffset (i.e.also writeHeader) would only only throw InterruptDetectedException, i.e. another thread saw interrupt, so retry. This would simplify logic in RAFContainer: we could remove the retry logic from RAFContainer#run(CREATE_CONTAINER_ACTION). This could also cover retry logic for RAFContainer#run(STUBBIFY_ACTION) wrt its use of writeRAFHeader. Next, RAFContainer#clean is already handling InterruptDetectedException and would with this change no longer see ClosedByInterruptException, AsynchronousCloseException or ClosedChannelException. This should solve DERBY-5325 (this bug). Modified: db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java Modified: db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java URL: http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java?rev=1148354&r1=1148353&r2=1148354&view=diff ============================================================================== --- db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java (original) +++ db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java Tue Jul 19 14:29:03 2011 @@ -1328,96 +1328,83 @@ class RAFContainer extends FileContainer throw StandardException.newException( SQLState.FILE_CREATE, se, file); } - boolean success = false; - int maxTries = MAX_INTERRUPT_RETRIES; - while (!success) { - success = true; - - try { + try { - // OK not to force WAL here, in fact, this operation - // preceeds the creation of the log record to ensure - // sufficient space. + // OK not to force WAL here, in fact, this operation + // preceeds the creation of the log record to ensure + // sufficient space. - dataFactory.writeInProgress(); - try + dataFactory.writeInProgress(); + try { fileData = file.getRandomAccessFile( "rw"); } - finally + finally { dataFactory.writeFinished(); } - // This container format specifies that the first page is - // an allocation page and the container information is - // stored within it. The allocation page needs to be - // somewhat formatted because if the system crashed after - // the create container log operation is written, it needs - // to be well formed enough to get the container - // information back out of it. - // - // Don't try to go thru the page cache here because the - // container object cannot be found in the container cache - // at this point yet. However, if we use the page cache - // to store the first allocation page, then in order to - // write itself out, it needs to ask the container to do - // so, which is going to create a deadlock. The - // allocation page cannot write itself out without going - // thru the container because it doesn't know where its - // offset is. Here we effectively hardwire page 0 at - // offset 0 of the container file to be the first - // allocation page. + // This container format specifies that the first page is + // an allocation page and the container information is + // stored within it. The allocation page needs to be + // somewhat formatted because if the system crashed after + // the create container log operation is written, it needs + // to be well formed enough to get the container + // information back out of it. + // + // Don't try to go thru the page cache here because the + // container object cannot be found in the container cache + // at this point yet. However, if we use the page cache + // to store the first allocation page, then in order to + // write itself out, it needs to ask the container to do + // so, which is going to create a deadlock. The + // allocation page cannot write itself out without going + // thru the container because it doesn't know where its + // offset is. Here we effectively hardwire page 0 at + // offset 0 of the container file to be the first + // allocation page. + + // create an embryonic page - if this is not a temporary + // container, synchronously write out the file header. + + canUpdate = true; // Need to set it now. After writeRAFHeader + // may be too late in case that method's IO + // is interrupted and container needs + // reopening. To get the correct "rw" mode + // we need canUpdate to be true. + + writeRAFHeader( + actionIdentity, fileData, true, + (actionIdentity.getSegmentId() != + ContainerHandle.TEMPORARY_SEGMENT)); - // create an embryonic page - if this is not a temporary - // container, synchronously write out the file header. - writeRAFHeader( - actionIdentity, fileData, true, - (actionIdentity.getSegmentId() != - ContainerHandle.TEMPORARY_SEGMENT)); - - } catch (IOException ioe) { - Class clazz = ioe.getClass(); - - // test with reflection since NIO is not in Foundation 1.1 - if (clazz.getName().equals( - "java.nio.channels.ClosedByInterruptException") || - clazz.getName().equals( // Java NIO Bug 6979009: - "java.nio.channels.AsynchronousCloseException")) { - - if (--maxTries > 0) { - success = false; - InterruptStatus.setInterrupted(); - closeContainer(); - continue; - } - } + } catch (IOException ioe) { - boolean fileDeleted; - try { - fileDeleted = privRemoveFile(file); - } catch (SecurityException se) { - throw StandardException.newException( - SQLState.FILE_CREATE_NO_CLEANUP, - ioe, - file, - se.toString()); - } + canUpdate = false; - if (!fileDeleted) { - throw StandardException.newException( - SQLState.FILE_CREATE_NO_CLEANUP, - ioe, - file, - ioe.toString()); - } + boolean fileDeleted; + try { + fileDeleted = privRemoveFile(file); + } catch (SecurityException se) { + throw StandardException.newException( + SQLState.FILE_CREATE_NO_CLEANUP, + ioe, + file, + se.toString()); + } + if (!fileDeleted) { throw StandardException.newException( - SQLState.FILE_CREATE, ioe, file); + SQLState.FILE_CREATE_NO_CLEANUP, + ioe, + file, + ioe.toString()); } + + throw StandardException.newException( + SQLState.FILE_CREATE, ioe, file); } - canUpdate = true; return null; } // end of case CREATE_CONTAINER_ACTION Modified: db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java URL: http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java?rev=1148354&r1=1148353&r2=1148354&view=diff ============================================================================== --- db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java (original) +++ db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java Tue Jul 19 14:29:03 2011 @@ -622,10 +622,6 @@ class RAFContainer4 extends RAFContainer threadsInPageIO, hashCode()); - // Recovery is in progress, wait for another - // interrupted thread to clean up, i.e. act as if we - // had seen ClosedChannelException. - awaitRestoreChannel(e, stealthMode); if (retries-- == 0) { throw StandardException.newException( @@ -1089,10 +1085,68 @@ class RAFContainer4 extends RAFContainer throws IOException, StandardException { FileChannel ioChannel = getChannel(file); - if (ioChannel != null) { - writeFull(ByteBuffer.wrap(bytes), ioChannel, offset); - } else { + + if (ioChannel == null) { super.writeAtOffset(file, bytes, offset); + return; + } + + ourChannel = ioChannel; + + boolean success = false; + boolean stealthMode = true; + + while (!success) { + + synchronized (this) { + // don't use ourChannel directly, could need re-initilization + // after interrupt and container reopening: + ioChannel = getChannel(); + } + + try { + writeFull(ByteBuffer.wrap(bytes), ioChannel, offset); + success = true; + //} catch (ClosedByInterruptException e) { + // Java NIO Bug 6979009: + // http://bugs.sun.com/view_bug.do?bug_id=6979009 + // Sometimes NIO throws AsynchronousCloseException instead of + // ClosedByInterruptException + } catch (AsynchronousCloseException e) { + // Subsumes ClosedByInterruptException + + // The interrupted thread may or may not get back here + // before other concurrent writers that will see + // ClosedChannelException, we have logic to handle that. + + if (Thread.currentThread().isInterrupted()) { + // Normal case + if (recoverContainerAfterInterrupt( + e.toString(), + stealthMode)) { + continue; // do I/O over again + } + } + // Recovery is in progress, wait for another + // interrupted thread to clean up, i.e. act as if we + // had seen ClosedChannelException. + + // stealthMode == true, so this will throw + // InterruptDetectedException + awaitRestoreChannel(e, stealthMode); + } catch (ClosedChannelException e) { + // We are not the thread that first saw the channel interrupt, + // so no recovery attempt. + + InterruptStatus.noteAndClearInterrupt( + "writeAtOffset in ClosedChannelException", + threadsInPageIO, + hashCode()); + + // stealthMode == true, so this will throw + // InterruptDetectedException + awaitRestoreChannel(e, stealthMode); + } } }