db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jørgen Løland <Jorgen.Lol...@Sun.COM>
Subject Re: [jira] Updated: (DERBY-4186) After failover, test fails when it succeeds in connecting early to failed over slave
Date Thu, 30 Apr 2009 06:30:05 GMT

Thanks for analyzing and fixing this strange issue! Stopping replication 
before the startSlave command had completed was never on my mind :-/

I had a look at you patch though, and I think you can fix this bug with 
even less code.

 From SlaveDatabase.java:86:
     /** Set by the database boot thread if it fails before slave mode
      * has been started properly (i.e., if inBoot is true). This
      * exception will then be reported to the client connection. */
     private volatile StandardException bootException;

bootException is only set in one place - SlaveDatabase#handleShutdown. 
There you'll also see the reason for the limbo state that made the tests 
fail: if an exception makes the slave replication code call 
handleShutdown while booting is in progress, the database is supposed to 
be shutdown by the client thread when it receives an exception from 

As you already found out, that didn't happen because the bootException 
was set during the 500 millis waiting in verifySuccesfulBoot. However, 
this should apply to any exception in bootException, not only 
DATABASE_SEVERITY ones (although I *think* only DB severity exceptions 
will be reported here).

I would go with the same code that is inside the while. Thus, instead of

+        if (bootException != null &&
+            SQLState.SHUTDOWN_DATABASE.startsWith(
+                bootException.getSQLState()) &&
+            bootException.getSeverity() == 
ExceptionSeverity.DATABASE_SEVERITY) {


+        if (bootException != null)

Dag H. Wanvik (JIRA) wrote:
>      [ https://issues.apache.org/jira/browse/DERBY-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> Dag H. Wanvik updated DERBY-4186:
> ---------------------------------
>     Attachment: derby-4186-2.stat
>                 derby-4186-2.diff
> Respin of this patch, #2, with more comments. I also talked to the author of this code
off-line, Jørgen Løland, and he agreed with my analysis. The new patch moves the check for
the lost exception to inside the method SlaveDataBase.verifySuccessfulBoot.
> Added more explanations in the comments.
>> After failover, test fails when it succeeds in connecting early to failed over slave
>> ------------------------------------------------------------------------------------
>>                 Key: DERBY-4186
>>                 URL: https://issues.apache.org/jira/browse/DERBY-4186
>>             Project: Derby
>>          Issue Type: Bug
>>          Components: Replication, Test
>>    Affects Versions:
>>            Reporter: Dag H. Wanvik
>>            Assignee: Dag H. Wanvik
>>         Attachments: bad-slave.txt, derby-4186-2.diff, derby-4186-2.stat, derby-4186.diff,
derby-4186.stat, ok-slave.txt
>> Occasionally I see this error in ReplicationRun_Local_3_p3:
>> 1) testReplication_Local_3_p3_StateNegativeTests(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p3)junit.framework.AssertionFailedError:
Expected SQLState'08004', but got connection!
>> 	at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.waitForSQLState(ReplicationRun.java:332)
>> 	at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p3.testReplication_Local_3_p3_StateNegativeTests(ReplicationRun_Local_3_p3.java:170)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> 	at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:105)
>> 	at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
>> 	at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
>> 	at junit.extensions.TestSetup.run(TestSetup.java:25)
>> In the code, after a stopMaster is given to the master (should lead to fail-over),
>> the tests expects to see CANNOT_CONNECT_TO_DB_IN_SLAVE_MODE (08004.C.7), which will
only succeed if
>> the tests gets to try to connect before the failover has started. This seems wrong.
If the failover has completed, it should expect a successful
>> connect (which boots the database, btw, since its shut down after auccessful failover).
>> Quote from code:
>> waitForSQLState("08004", 100L, 20, // 08004.C.7 - CANNOT_CONNECT_TO_DB_IN_SLAVE_MODE
>>                 slaveDatabasePath + FS + slaveDbSubPath + FS + replicatedDb,
>>                 slaveServerHost, slaveServerPort); // _failOver above fails...
>> There is a race between the failover on the slave and the test here I think.

Jørgen Løland

View raw message