db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DERBY-5643) Occasional hangs in replication tests on Linux
Date Mon, 19 Mar 2012 17:29:37 GMT

     [ https://issues.apache.org/jira/browse/DERBY-5643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Knut Anders Hatlen updated DERBY-5643:
--------------------------------------

    Attachment: fail-on-timeout.diff

I've found some problems with the previous changes. I'm not sure if that's what causing problems
in the nightly testing, though.

The replication tests and the compatibility tests now call pingForServerUp() and wait until
the server is up or a timeout happens, but they don't fail if a timeout happens. The attached
patch (fail-on-timeout.diff) makes those tests check the return value of pingForServerUp()
and fail if the server did not come up in time.

The patch also changes how AutoloadTest.testAutoNetworkServerBoot() verifies that the server
did not come up. In the existing code, it pings until the timeout happens, and then returns
successfully. This took 40 seconds before the timeout was changed, and now it takes 4 minutes.
The timeout is not supposed to affect tests under normal operation, so it should not have
pinged that long in the first place.

The patch makes the test wait for a shorter time (5 seconds) before concluding that the server
didn't come up. This may cause bugs to go unnoticed on very slow machines (if the server comes
up when it shouldn't, but it takes more than 5 seconds), but it will speed up the test considerably
and still detect the problem on reasonably fast machines.
                
> Occasional hangs in replication tests on Linux
> ----------------------------------------------
>
>                 Key: DERBY-5643
>                 URL: https://issues.apache.org/jira/browse/DERBY-5643
>             Project: Derby
>          Issue Type: Bug
>          Components: Replication, Test
>    Affects Versions: 10.9.0.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>             Fix For: 10.9.0.0
>
>         Attachments: fail-on-timeout.diff, higher-timeout.diff, thread-dump.txt, waitFor-2.diff,
waitFor.diff
>
>
> We occasionally see hangs in the replication tests on Linux. For example here: http://dbtg.foundry.sun.com/derby/test/Daily/jvm1.6/testing/testlog/sles/1298470-suitesAll_diff.txt
> This test run was stuck in tearDown() after ReplicationRun_Local_Derby4910.testSlaveWaitsForMaster().
(Waiting for Thread.join() to return.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message