db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel John Debrunner <...@apache.org>
Subject Re: [jira] Commented: (DERBY-3514) SecureServerTest failing with timeout waiting for the network server to start only when run in derbynet._Suite
Date Sat, 08 Mar 2008 15:38:53 GMT
Mike Matrigali wrote:
> Daniel John Debrunner (JIRA) wrote:
>>     [ 
>> https://issues.apache.org/jira/browse/DERBY-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576389#action_12576389

>> ]
>> Daniel John Debrunner commented on DERBY-3514:
>> ----------------------------------------------
>> Issue was due to an earlier test calling a network server command that 
>> failed (setting trace on with an invalid directory).
>> For most of the network server commands if an exception is thrown the 
>> network socket is never closed (left to garbage collection).
>> Most likely the longer wait that NetworkServerTestSetup had was enough 
>> time to get the socket closed and thus freed up for the network server 
>> to use.
>> I accidentally committed a reduced wait time in NetworkServerTestSetup 
>> yesterday while working on DERBY-3504.
>> I plan on leaving this reduced time (10 seconds to start the server 
>> rather than the old 300 seconds)  as the server should come up in that 
>> time and my belief is that extending the time is really just hiding 
>> bugs (like this one).
> Is 10 seconds really the number across all platforms, accounting for any
> other activity that may be happening on the machine?  It would be nice 
> if our tests didn't fail mysteriously with a timeout error if some other
> activity on the machine happened to affect performance.

10 seconds may be too low but I think 300 is too high. There isn't a lot 
of code needed to start the network server. Maybe we could leave at 10 
for a while and see if anyone hits any problems.

> It would be great if someone could fix all the tests to properly 
> cleanup, but for now pouring through intermittent timeout diffs is
> not helping me tell if my latest change broke the codeline or not.
> The reality is that now I will run less tests, ignoring the failures
> in those tests that timeout.  With the timeout set high at least the
> tests run and the functionality is tested.  I agree there may be a 
> missed bug in a test or even more serious if network startup started
> to take 5 minutes everytime.

Note that the bug I just fixed that was hidden by the 300 seconds 
timeout was a bug in the network server code, not in any test.
> Do you have any hints on how to find the previous test that may be
> causing the bug if I am hitting timeouts for this reason?

Just like debugging any other issue, remove elements until the problem 
disappears and then determine which one is causing the problem.


View raw message