db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dag H. Wanvik (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (DERBY-4186) After failover, test fails when it succeeds in connecting early to failed over slave
Date Mon, 27 Apr 2009 19:14:30 GMT

    [ https://issues.apache.org/jira/browse/DERBY-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702626#action_12702626
] 

Dag H. Wanvik edited comment on DERBY-4186 at 4/27/09 12:13 PM:
----------------------------------------------------------------

My initial analysis was not entirely correct. Looking at the log file, I see that the
setting up of the master never succeeded in the cases where we see 08004.C.7.
This in turn lead to the stopMaster to fail (there is no master yet!), but operation does
not throw because of this
piece of code in MasterController.tearDownNetwork called from MasterController.stopMaster

            try {
                ReplicationMessage mesg =
                    new ReplicationMessage(ReplicationMessage.TYPE_STOP, null);
                transmitter.sendMessage(mesg);
            } catch (IOException ioe) {}   // <************ java.net.ConnectException:
Connection refused
            try {
                transmitter.tearDown();
            } catch (IOException ioe) {}

The end result of this is that the slave is still listening when the test comes around to
calling to waitForSQLState (seethe
issue description), so we naturally get 08004.C.7 CANNOT_CONNECT_TO_DB_IN_SLAVE_MODE. 
But the test is also wrong, it should expect success here.

Now the next question is, why does the test think starting the master worked? It calls the
method ReplicationRun.startMaster to
achieve this.

[2009.04.27 comment added: this turned out to be a red herring, see below.]




      was (Author: dagw):
    My initial analysis was not entirely correct. Looking at the log file, I see that the
setting up of the master never succeeded in the cases where we see 08004.C.7.
This in turn lead to the stopMaster to fail (there is no master yet!), but operation does
not throw because of this
piece of code in MasterController.tearDownNetwork called from MasterController.stopMaster

            try {
                ReplicationMessage mesg =
                    new ReplicationMessage(ReplicationMessage.TYPE_STOP, null);
                transmitter.sendMessage(mesg);
            } catch (IOException ioe) {}   // <************ java.net.ConnectException:
Connection refused
            try {
                transmitter.tearDown();
            } catch (IOException ioe) {}

The end result of this is that the slave is still listening when the test comes around to
calling to waitForSQLState (seethe
issue description), so we naturally get 08004.C.7 CANNOT_CONNECT_TO_DB_IN_SLAVE_MODE. 
But the test is also wrong, it should expect success here.

Now the next question is, why does the test think starting the master worked? It calls the
method ReplicationRun.startMaster to
achieve this.





  
> After failover, test fails when it succeeds in connecting early to failed over slave
> ------------------------------------------------------------------------------------
>
>                 Key: DERBY-4186
>                 URL: https://issues.apache.org/jira/browse/DERBY-4186
>             Project: Derby
>          Issue Type: Bug
>          Components: Replication, Test
>    Affects Versions: 10.6.0.0
>            Reporter: Dag H. Wanvik
>         Attachments: bad-slave.txt, derby-4186.diff, derby-4186.stat, ok-slave.txt
>
>
> Occasionally I see this error in ReplicationRun_Local_3_p3:
> 1) testReplication_Local_3_p3_StateNegativeTests(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p3)junit.framework.AssertionFailedError:
Expected SQLState'08004', but got connection!
> 	at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.waitForSQLState(ReplicationRun.java:332)
> 	at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p3.testReplication_Local_3_p3_StateNegativeTests(ReplicationRun_Local_3_p3.java:170)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:105)
> 	at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
> 	at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
> 	at junit.extensions.TestSetup.run(TestSetup.java:25)
> In the code, after a stopMaster is given to the master (should lead to fail-over),
> the tests expects to see CANNOT_CONNECT_TO_DB_IN_SLAVE_MODE (08004.C.7), which will only
succeed if
> the tests gets to try to connect before the failover has started. This seems wrong. If
the failover has completed, it should expect a successful
> connect (which boots the database, btw, since its shut down after auccessful failover).
> Quote from code:
> waitForSQLState("08004", 100L, 20, // 08004.C.7 - CANNOT_CONNECT_TO_DB_IN_SLAVE_MODE
>                 slaveDatabasePath + FS + slaveDbSubPath + FS + replicatedDb,
>                 slaveServerHost, slaveServerPort); // _failOver above fails...
> There is a race between the failover on the slave and the test here I think.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message