db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From d..@apache.org
Subject svn commit: r1040086 - in /db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data: RAFContainer.java RAFContainer4.java
Date Mon, 29 Nov 2010 12:28:55 GMT
Author: dag
Date: Mon Nov 29 12:28:54 2010
New Revision: 1040086

URL: http://svn.apache.org/viewvc?rev=1040086&view=rev
Log:
DERBY-4741 Make Derby work reliably in the presence of thread interrupts

Patch derby-4741-c-01-nio: closes two corner cases I have
observed when stress testing the RAFContainer4 recovery mechanism. It
also does some other small cleanups. Regressions ran OK.

RAFContainer:

If we receive an interrupt when the container is first being opened
(i.e. during RAFContainer.run (OPEN_CONTAINER_ACTION) ->
getEmbryonicPage), recovery will fail because currentIdentity needed
in RAFContainer4#recoverContainerAfterInterrupt hasn't yet been
set.

RAFContainer4:

If a stealthMode read is interrupted and is recovering the container,
it erroneously increments threadsInPageIO just before exiting to retry
IO. This leads to a break in the invariant that threadsInPageIO be 0
when all threads are done, causing issue (hang) down the line.

If, when we are reopening the container, the read being done during
that operation (getEmbryonicPage), that stealth mode read will also
lead to a (recursive) recovery. We have to catch this case by adding a
"catch (InterruptDetectedException e)" just after the call to
openContainer, not by testing the interrupt flag as presently done,
since the recovery inside the recursive call to
getEmbryonicPage/readPage will already have cleared the flag and done
recovery.

When giving up reopening the container for some reason, we also forgot
to decrement threadsInPageIO. 


Modified:
    db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java
    db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java

Modified: db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java
URL: http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java?rev=1040086&r1=1040085&r2=1040086&view=diff
==============================================================================
--- db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java
(original)
+++ db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer.java
Mon Nov 29 12:28:54 2010
@@ -898,18 +898,30 @@ class RAFContainer extends FileContainer
 		return true;
     } // end of privRemoveFile
 
-	synchronized boolean openContainer(ContainerKey newIdentity)
+    protected ContainerKey idAPriori = null;
+
+    synchronized boolean openContainer(ContainerKey newIdentity)
         throws StandardException
     {
         actionCode = OPEN_CONTAINER_ACTION;
         actionIdentity = newIdentity;
+        boolean success = false;
+        idAPriori = currentIdentity;
+
         try
         {
-            boolean success = AccessController.doPrivileged(this) != null;
-            if (success) {
-                currentIdentity = newIdentity;
-            }
+            currentIdentity = newIdentity;
+            // NIO: We need to set currentIdentity before we try to open, in
+            // case we need its value to perform a recovery in the case of an
+            // interrupt during readEmbryonicPage as part of
+            // OPEN_CONTAINER_ACTION.  Note that this gives a recursive call to
+            // openContainer.
+            //
+            // If we don't succeed in opening, we reset currentIdentity to its
+            // a priori value.
 
+            success = AccessController.doPrivileged(this) != null;
+            idAPriori = currentIdentity;
             return success;
         }
         catch( PrivilegedActionException pae) { 
@@ -921,7 +933,11 @@ class RAFContainer extends FileContainer
             throw e;
         }
         finally
-        { 
+        {
+            if (!success) {
+                currentIdentity = idAPriori;
+            }
+
             actionIdentity = null; 
         }
     }

Modified: db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java
URL: http://svn.apache.org/viewvc/db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java?rev=1040086&r1=1040085&r2=1040086&view=diff
==============================================================================
--- db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java
(original)
+++ db/derby/code/trunk/java/engine/org/apache/derby/impl/store/raw/data/RAFContainer4.java
Mon Nov 29 12:28:54 2010
@@ -637,6 +637,10 @@ class RAFContainer4 extends RAFContainer
         }
 
         synchronized (channelCleanupMonitor) {
+            // Pave way for the thread that received the interrupt that caused
+            // the channel close to clean up, by signaling we are waiting (no
+            // longer doing IO):
+
             threadsInPageIO--;
         }
 
@@ -671,6 +675,9 @@ class RAFContainer4 extends RAFContainer
                     }
                 }
 
+                // Since the channel is presumably ok (lest giveUpIO is set,
+                // see below), we put ourselveds back in the IO set of threads:
+
                 threadsInPageIO++;
                 break;
             }
@@ -685,6 +692,7 @@ class RAFContainer4 extends RAFContainer
                         "resurrecting container ");
                 }
 
+                threadsInPageIO--;
                 throw StandardException.newException(
                     SQLState.FILE_IO_INTERRUPTED);
             }
@@ -715,7 +723,7 @@ class RAFContainer4 extends RAFContainer
         boolean stealthMode) throws StandardException {
 
         if (stealthMode && restoreChannelInProgress) {
-            // Another interrupted thread got to do the cleanup before us, so
+            // 1) Another interrupted thread got to do the cleanup before us, so
             // yield.
             // This should not happen, but since
             // we had to "fix" NIO, cf. the code marked (**), we could
@@ -734,6 +742,11 @@ class RAFContainer4 extends RAFContainer
             // Not safe for Java 1.4 (only volatile protection for
             // restoreChannelInProgress here), compare safe test below (not
             // stealthMode).
+            //
+            // 2) The other way to end up here is if we get interrupted during
+            // getEmbryonicPage called during container recovery from the same
+            // thread (restoreChannelInProgress is set then, and
+            // getEmbryonicPage is stealthMode)
 
             InterruptStatus.noteAndClearInterrupt(
                 whence,
@@ -808,26 +821,22 @@ class RAFContainer4 extends RAFContainer
                         try {
                             closeContainer();
                             openContainer(currentIdentity);
-                        } catch (Exception newE) {
+                        } catch (InterruptDetectedException e) {
                             // Interrupted again?
-
-                            if (InterruptStatus.noteAndClearInterrupt(
-                                        "RAF: isInterrupted during recovery",
-                                        threadsInPageIO,
-                                        hashCode())) {
-                                continue;
-                            } else {
-                                // Something else failed - shutdown happening?
-                                synchronized(giveUpIOm) {
-                                    // Make sure other threads will give up and
-                                    // throw, too.
-                                    giveUpIO = true;
-
-                                    if (SanityManager.DEBUG) {
-                                        debugTrace(
-                                            "can't resurrect container: " +
-                                            newE);
-                                    }
+                            debugTrace("interrupted during recovery's " +
+                                       "readEmbryonicPage");
+                            continue;
+                        } catch (Exception newE) {
+                            // Something else failed - shutdown happening?
+                            synchronized(giveUpIOm) {
+                                // Make sure other threads will give up and
+                                // throw, too.
+                                giveUpIO = true;
+
+                                if (SanityManager.DEBUG) {
+                                    debugTrace(
+                                        "can't resurrect container: " +
+                                        newE);
                                 }
 
                                 throw StandardException.newException(
@@ -838,7 +847,12 @@ class RAFContainer4 extends RAFContainer
                     }
                 }
 
-                threadsInPageIO++;
+                if (stealthMode) {
+                    // don't touch threadsInPageIO
+                } else {
+                    threadsInPageIO++;
+                }
+
                 // retry IO
             } finally {
                 // Recovery work done (or failed), now set other threads free



Mime
View raw message