Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 20910 invoked from network); 27 Apr 2009 19:14:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Apr 2009 19:14:52 -0000 Received: (qmail 8779 invoked by uid 500); 27 Apr 2009 19:14:51 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 8729 invoked by uid 500); 27 Apr 2009 19:14:51 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 8721 invoked by uid 99); 27 Apr 2009 19:14:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Apr 2009 19:14:51 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Apr 2009 19:14:50 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 7E113234C4AB for ; Mon, 27 Apr 2009 12:14:30 -0700 (PDT) Message-ID: <343588155.1240859670515.JavaMail.jira@brutus> Date: Mon, 27 Apr 2009 12:14:30 -0700 (PDT) From: "Dag H. Wanvik (JIRA)" To: derby-dev@db.apache.org Subject: [jira] Issue Comment Edited: (DERBY-4186) After failover, test fails when it succeeds in connecting early to failed over slave In-Reply-To: <392312840.1240598610301.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/DERBY-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702626#action_12702626 ] Dag H. Wanvik edited comment on DERBY-4186 at 4/27/09 12:13 PM: ---------------------------------------------------------------- My initial analysis was not entirely correct. Looking at the log file, I see that the setting up of the master never succeeded in the cases where we see 08004.C.7. This in turn lead to the stopMaster to fail (there is no master yet!), but operation does not throw because of this piece of code in MasterController.tearDownNetwork called from MasterController.stopMaster try { ReplicationMessage mesg = new ReplicationMessage(ReplicationMessage.TYPE_STOP, null); transmitter.sendMessage(mesg); } catch (IOException ioe) {} // <************ java.net.ConnectException: Connection refused try { transmitter.tearDown(); } catch (IOException ioe) {} The end result of this is that the slave is still listening when the test comes around to calling to waitForSQLState (seethe issue description), so we naturally get 08004.C.7 CANNOT_CONNECT_TO_DB_IN_SLAVE_MODE. But the test is also wrong, it should expect success here. Now the next question is, why does the test think starting the master worked? It calls the method ReplicationRun.startMaster to achieve this. [2009.04.27 comment added: this turned out to be a red herring, see below.] was (Author: dagw): My initial analysis was not entirely correct. Looking at the log file, I see that the setting up of the master never succeeded in the cases where we see 08004.C.7. This in turn lead to the stopMaster to fail (there is no master yet!), but operation does not throw because of this piece of code in MasterController.tearDownNetwork called from MasterController.stopMaster try { ReplicationMessage mesg = new ReplicationMessage(ReplicationMessage.TYPE_STOP, null); transmitter.sendMessage(mesg); } catch (IOException ioe) {} // <************ java.net.ConnectException: Connection refused try { transmitter.tearDown(); } catch (IOException ioe) {} The end result of this is that the slave is still listening when the test comes around to calling to waitForSQLState (seethe issue description), so we naturally get 08004.C.7 CANNOT_CONNECT_TO_DB_IN_SLAVE_MODE. But the test is also wrong, it should expect success here. Now the next question is, why does the test think starting the master worked? It calls the method ReplicationRun.startMaster to achieve this. > After failover, test fails when it succeeds in connecting early to failed over slave > ------------------------------------------------------------------------------------ > > Key: DERBY-4186 > URL: https://issues.apache.org/jira/browse/DERBY-4186 > Project: Derby > Issue Type: Bug > Components: Replication, Test > Affects Versions: 10.6.0.0 > Reporter: Dag H. Wanvik > Attachments: bad-slave.txt, derby-4186.diff, derby-4186.stat, ok-slave.txt > > > Occasionally I see this error in ReplicationRun_Local_3_p3: > 1) testReplication_Local_3_p3_StateNegativeTests(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p3)junit.framework.AssertionFailedError: Expected SQLState'08004', but got connection! > at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.waitForSQLState(ReplicationRun.java:332) > at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p3.testReplication_Local_3_p3_StateNegativeTests(ReplicationRun_Local_3_p3.java:170) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:105) > at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) > at junit.extensions.TestSetup$1.protect(TestSetup.java:21) > at junit.extensions.TestSetup.run(TestSetup.java:25) > In the code, after a stopMaster is given to the master (should lead to fail-over), > the tests expects to see CANNOT_CONNECT_TO_DB_IN_SLAVE_MODE (08004.C.7), which will only succeed if > the tests gets to try to connect before the failover has started. This seems wrong. If the failover has completed, it should expect a successful > connect (which boots the database, btw, since its shut down after auccessful failover). > Quote from code: > waitForSQLState("08004", 100L, 20, // 08004.C.7 - CANNOT_CONNECT_TO_DB_IN_SLAVE_MODE > slaveDatabasePath + FS + slaveDbSubPath + FS + replicatedDb, > slaveServerHost, slaveServerPort); // _failOver above fails... > There is a race between the failover on the slave and the test here I think. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.