db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dag H. Wanvik (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-4186) After failover, test fails when it succeeds in connecting early to failed over slave
Date Mon, 27 Apr 2009 19:10:30 GMT

     [ https://issues.apache.org/jira/browse/DERBY-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dag H. Wanvik updated DERBY-4186:
---------------------------------

    Attachment: derby-4186.diff
                ok-slave.txt
                bad-slave.txt

Further analysis shows that the uncaught IOException I referred to
when trying to send a stop message to the slave is just a part of the
generic MasterController.tearDownNetwork when establishment of the master fails initially.
The test correctly loops to wait for successful startup of the master and there will be
some failed attempts (waiting for the slave to be ready) which visit that failure code path.
It is not a problem though, just (another) red herring.

I enclose a patch proposal, which addresses the real issue: the slave does not shut down when
it should.
The scenario is that the slave receives a stop replication message before it had time to complete
the slave boot (race condition), see attachment bad-slave.txt. If I make the test client
wait instead of proceeding to stop the master, the slave log looks like the one in ok-slave.txt
(attached).

It would be nice if any of the original authors had a look at this patch as I am not familiar
with this code.

The patch also modifies the test client to loop until success, accepting intermediate state
CANNOT_CONNECT_TO_DB_IN_SLAVE_MODE.

Running regressions. 

> After failover, test fails when it succeeds in connecting early to failed over slave
> ------------------------------------------------------------------------------------
>
>                 Key: DERBY-4186
>                 URL: https://issues.apache.org/jira/browse/DERBY-4186
>             Project: Derby
>          Issue Type: Bug
>          Components: Replication, Test
>    Affects Versions: 10.6.0.0
>            Reporter: Dag H. Wanvik
>         Attachments: bad-slave.txt, derby-4186.diff, ok-slave.txt
>
>
> Occasionally I see this error in ReplicationRun_Local_3_p3:
> 1) testReplication_Local_3_p3_StateNegativeTests(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p3)junit.framework.AssertionFailedError:
Expected SQLState'08004', but got connection!
> 	at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.waitForSQLState(ReplicationRun.java:332)
> 	at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p3.testReplication_Local_3_p3_StateNegativeTests(ReplicationRun_Local_3_p3.java:170)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:105)
> 	at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
> 	at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
> 	at junit.extensions.TestSetup.run(TestSetup.java:25)
> In the code, after a stopMaster is given to the master (should lead to fail-over),
> the tests expects to see CANNOT_CONNECT_TO_DB_IN_SLAVE_MODE (08004.C.7), which will only
succeed if
> the tests gets to try to connect before the failover has started. This seems wrong. If
the failover has completed, it should expect a successful
> connect (which boots the database, btw, since its shut down after auccessful failover).
> Quote from code:
> waitForSQLState("08004", 100L, 20, // 08004.C.7 - CANNOT_CONNECT_TO_DB_IN_SLAVE_MODE
>                 slaveDatabasePath + FS + slaveDbSubPath + FS + replicatedDb,
>                 slaveServerHost, slaveServerPort); // _failOver above fails...
> There is a race between the failover on the slave and the test here I think.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message