db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dag H. Wanvik (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DERBY-4269) Failover did not succeed in 2 min.: testReplication_Local_3_p6_autocommit_OK
Date Wed, 13 Jun 2012 15:33:24 GMT

    [ https://issues.apache.org/jira/browse/DERBY-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293709#comment-13293709
] 

Dag H. Wanvik edited comment on DERBY-4269 at 6/13/12 3:30 PM:
---------------------------------------------------------------

Yes, that was my conclusion too. The usage of property lists is extremely convoluted in the
Derby boot process, and it was very hard to understand what's going on. And since this goes
on in a forked VM, using a debugger is tricky without affecting the timing... I replaced all
usages of java.util.Properties with a specialized version (DerbyProperties), so I could track
all allocations and modifications of properties. Adding to the problem is the fact that Properties
can have default Properties lists recursively, and this is being used. In addition Derby adds
two layers of specializations: UpdateServiceProperties and DoubleProperties.

Mostly, property lists are passed into constructors without any cloning of them so the number
of live aliases is high.

It turns out the layering at the time of the NPE in the boot is like this:

DoubleProperties#write instance of
   UpdateServiceProperties#defaults instance of
       Properties#defaults instance of
           Properties (no default)   <----- race on this one

The bottom one is the one that's being changed under our feet in these lines in EmbedConnection#bootDatabase:

    :
    // clear these values as some modules hang onto                                  
    // the properties set corresponding to service.properties                        
    // and they shouldn't be interested in these JDBC attributes.                    
    info.clear(); 
    
It is initially allocated in EmbedConnection#filterProperties:

    private Properties filterProperties(Properties inputSet) {                           
   
        Properties limited = new org.apache.derby.iapi.util.DerbyProperties();           
   
        :

assigning it to "info".

The thread that is doing this is the thread that initially started the slave operation (cf.
URL "startSlave=true"). This thread hangs until the master has been started and replication
is underway[1]. When the master comes online this thread the connect returns with XRE08 ("REPLICATION_SLAVE_STARTED_OK").

The failover happens in this test almost immediately, and the SlaveDatabaseBootThread then
starts moving again (it i waiting for more log records to replay or failover).

Unfortunately, in the current scenario, the thread doing "startSlave=true" is still active,
normally it would have finished long before SlaveDatabaseBootThread starts moving again.

The Property default at the 4th level above would normally be empty, since its being cleared
by the "startSlave=true" thread in the "info.clear". Bit since it's lagging behind, it only
does this "clear" when the SlaveDatabaseBootThread has started its boot, leading to the contention.


[1] In EmbedConnection.bootDatabase, this corresponds to the line:

// try to start the service if it doesn't already exist
if (!Monitor.startPersistentService(dbname, info)) {
    
 

                
      was (Author: dagw):
    Yes, that was my conclusion too. The usage of property lists is extremely convoluted in
the Derby boot process, and it was very hard to understand what's going on. And since this
goes on in a forked VM, using a debugger is tricky without affecting the timing... I replaced
all usages of java.util.Properties with a specialized version (DerbyProperties), so I could
track all allocations and modifications of properties. Adding to the problem is the fact that
Properties can have default Properties lists recursively, and this is being used. In addition
Derby adds two layers of specializations: UpdateServiceProperties and DoubleProperties.

Mostly, property lists are passed into constructors without any cloning of them so the number
of live aliases is high.

It turns out the layering at the time of the NPE in the boot is like this:

DoubleProperties#write instance of
   UpdateServiceProperties#defaults instance of
       Properties#defaults instance of
           Properties (no default)   <----- race on this one

The bottom one is the one that's being changed under our feet in these lines in EmbedConnection#bootDatabase:

    :
    // clear these values as some modules hang onto                                  
    // the properties set corresponding to service.properties                        
    // and they shouldn't be interested in these JDBC attributes.                    
    info.clear(); 
    
It is initially allocated in EmbedConnection#filterProperties:

    private Properties filterProperties(Properties inputSet) {                           
   
        Properties limited = new org.apache.derby.iapi.util.DerbyProperties();           
   
        :

assigning it to "info".

The thread that is doing this is the thread that initially started the slave operation (cf.
URL "startSlave=true"). This thread hangs until the master has been started and replication
is underway. When the master comes online this thread the connect returns with XRE08 ("REPLICATION_SLAVE_STARTED_OK").

The failover happens in this test almost immediately, and the SlaveDatabaseBootThread then
starts moving again (it i waiting for more log records to replay or failover).

Unfortunately, in the current scenario, the thread doing "startSlave=true" is still active,
normally it would have finished long before SlaveDatabaseBootThread starts moving again.

The Property default at the 4th level above would normally be empty, since its being cleared
by the "startSlave=true" thread in the "info.clear". Bit since it's lagging behind, it only
does this "clear" when the SlaveDatabaseBootThread has started its boot, leading to the contention.



 

                  
> Failover did not succeed in 2 min.: testReplication_Local_3_p6_autocommit_OK
> ----------------------------------------------------------------------------
>
>                 Key: DERBY-4269
>                 URL: https://issues.apache.org/jira/browse/DERBY-4269
>             Project: Derby
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 10.4.1.3, 10.4.2.0, 10.5.1.1, 10.5.2.0, 10.5.3.0, 10.6.1.0, 10.6.2.1,
10.7.1.1, 10.8.1.2, 10.8.2.2
>         Environment: OS:
> Microsoft© Windows VistaT Ultimate - 6.0.6001 Service Pack 1 - WindowsNT 0 6
> JVM:
> Sun Microsystems Inc. 
> java version "1.4.2_16"
> Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_16-b05)
> Java HotSpot(TM) Client VM (build 1.4.2_16-b05 mixed mode 32-bit)
>            Reporter: Ole Solberg
>            Assignee: Dag H. Wanvik
>              Labels: derby_triage10_5_2
>         Attachments: 4269-client-jstack.txt, 4269-master.txt, 4269-slave-jstack-before-failover.txt,
4269-slave.txt, DERBY-4269.diff, DERBY-4269.stat, DERBY-4269b.diff, db_master-derby.log, db_slave-derby.log.gz,
derby-4269-typo.diff
>
>
> Failover did not succeed.
> 2) testReplication_Local_3_p6_autocommit_OK(org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p6)junit.framework.AssertionFailedError:
Failover did not succeed.
> 	at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun.connectPing(ReplicationRun.java:270)
> 	at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p6.derby_3896(ReplicationRun_Local_3_p6.java:200)
> 	at org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationRun_Local_3_p6.testReplication_Local_3_p6_autocommit_OK(ReplicationRun_Local_3_p6.java:86)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:106)
> 	at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
> 	at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
> 	at junit.extensions.TestSetup.run(TestSetup.java:25)
> See http://dbtg.thresher.com/derby/test/Daily/jvm1.4/testing/testlog/vista-64/782274-suitesAll_diff.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message