db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5975) intermittent nightly test failure across releases in Derby5937SlaveShutdownTest.testSlaveFailoverLeak
Date Thu, 01 Nov 2012 09:27:12 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488567#comment-13488567
] 

Knut Anders Hatlen commented on DERBY-5975:
-------------------------------------------

Thanks, Mike.

It looks like replication failover fails with a NullPointerException:

Caused by: java.lang.NullPointerException
	at java.io.ObjectOutputStream.drain(ObjectOutputStream.java:258)
	at java.io.ObjectOutputStream.flush(ObjectOutputStream.java:331)
	at java.io.ObjectOutputStream.close(ObjectOutputStream.java:220)
	at org.apache.derby.impl.store.replication.net.SocketConnection.tearDown(Unknown Source)
	at org.apache.derby.impl.store.replication.net.ReplicationMessageTransmit.tearDown(Unknown
Source)
	at org.apache.derby.impl.store.replication.master.MasterController.teardownNetwork(Unknown
Source)
	at org.apache.derby.impl.store.replication.master.MasterController.startFailover(Unknown
Source)
	at org.apache.derby.impl.store.raw.RawStore.failover(Unknown Source)
	at org.apache.derby.impl.store.access.RAMAccessManager.failover(Unknown Source)
	at org.apache.derby.impl.db.BasicDatabase.failover(Unknown Source)
	at org.apache.derby.impl.jdbc.EmbedConnection.handleFailoverMaster(Unknown Source)
	... 41 more

Since the NPE happens inside java.io.ObjectOutputStream.drain(), and not in Derby code called
from drain(), I'd expect this to be a JVM bug. At least I have convinced myself that this
NPE is impossible in OpenJDK by looking at the source for the ObjectOutputStream class. The
argument I used to convince myself, goes like this: In OpenJDK, ObjectOutputStream's drain()
method simply forwards the call to bout.drain(), so any NPE in drain must be caused by the
bout field being null. bout is a final field which is always initialized to a non-null value
in the ObjectOutputStream constructor we use in replication.net.SocketConnection, so it's
guaranteed to be non-null in drain(), and the NPE can't happen.

I think it's OK to disable this test on weme for now. Note that this is the only replication
test that runs on weme (or any of the JSR-169/CDC FP platforms), so this code has never been
exercised on weme before. All the other replication tests use separate network servers and
the client driver to communicate with them, and are therefore disabled on those platforms.
                
> intermittent nightly test failure across releases in Derby5937SlaveShutdownTest.testSlaveFailoverLeak
> -----------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-5975
>                 URL: https://issues.apache.org/jira/browse/DERBY-5975
>             Project: Derby
>          Issue Type: Bug
>          Components: Test
>    Affects Versions: 10.8.2.3, 10.9.1.1, 10.10.0.0
>         Environment: windows weme6.2
>            Reporter: Mike Matrigali
>         Attachments: fail.zip
>
>
> Across multiple versions nightly tests have failed in Derby5937SlaveShutdownTest.testSlaveFailoverLeak.
> Subsequent to this no other test runs and thus we get no info printed to the log, and
the ibm test
> reporter does not post anything other than a red box if the tests do not finish.  Not
sure if the
> tests are hanging as part of trying to clean up the failure or if the next test is hanging.
 Will post 
> test runs that have failed in additional comments.
> So far I have only seen this on weme6.2 windows runs.  Likely there is a timing issue
that causes the
> test to fail and then bad cleanup of this test leads to hang.  In the one stack I see
as thread stuck
> in shutdown and a thread stuck waiting on the log.  
> If no easy fixes for this it may make sense to disable this test in this one environment
until someone
> wants to work on this one.  Then we can at least get the rest of the testing to procede.
> (emb)jdbcapi.DatabaseMetaDataTest.testGetColumns_DERBY5274 used 343 ms .
> (emb)jdbcapi.DatabaseMetaDataTest.testDMDconnClosed used 79 ms  Test upgrade done.
> Test upgrade from: 10.9.1.0, phase: POST UPGRADE
> .
> (emb)upgradeTests.BasicSetup.noConnectionAfterHardUpgrade used 156 ms  Test upgrade done.
> .
> (emb)replicationTests.Derby5937SlaveShutdownTest.testSlaveFailoverLeak used 24221 ms
F

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message