db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel John Debrunner <...@apache.org>
Subject Re: [jira] Commented: (DERBY-3514) SecureServerTest failing with timeout waiting for the network server to start only when run in derbynet._Suite
Date Sat, 08 Mar 2008 15:38:53 GMT
Mike Matrigali wrote:
> Daniel John Debrunner (JIRA) wrote:
>>     [ 
>> https://issues.apache.org/jira/browse/DERBY-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576389#action_12576389

>> ]
>> Daniel John Debrunner commented on DERBY-3514:
>> ----------------------------------------------
>>
>> Issue was due to an earlier test calling a network server command that 
>> failed (setting trace on with an invalid directory).
>> For most of the network server commands if an exception is thrown the 
>> network socket is never closed (left to garbage collection).
>> Most likely the longer wait that NetworkServerTestSetup had was enough 
>> time to get the socket closed and thus freed up for the network server 
>> to use.
>>
>> I accidentally committed a reduced wait time in NetworkServerTestSetup 
>> yesterday while working on DERBY-3504.
>> I plan on leaving this reduced time (10 seconds to start the server 
>> rather than the old 300 seconds)  as the server should come up in that 
>> time and my belief is that extending the time is really just hiding 
>> bugs (like this one).
>>
> Is 10 seconds really the number across all platforms, accounting for any
> other activity that may be happening on the machine?  It would be nice 
> if our tests didn't fail mysteriously with a timeout error if some other
> activity on the machine happened to affect performance.

10 seconds may be too low but I think 300 is too high. There isn't a lot 
of code needed to start the network server. Maybe we could leave at 10 
for a while and see if anyone hits any problems.

> It would be great if someone could fix all the tests to properly 
> cleanup, but for now pouring through intermittent timeout diffs is
> not helping me tell if my latest change broke the codeline or not.
> The reality is that now I will run less tests, ignoring the failures
> in those tests that timeout.  With the timeout set high at least the
> tests run and the functionality is tested.  I agree there may be a 
> missed bug in a test or even more serious if network startup started
> to take 5 minutes everytime.

Note that the bug I just fixed that was hidden by the 300 seconds 
timeout was a bug in the network server code, not in any test.
> 
> Do you have any hints on how to find the previous test that may be
> causing the bug if I am hitting timeouts for this reason?

Just like debugging any other issue, remove elements until the problem 
disappears and then determine which one is causing the problem.

Dan.

Mime
View raw message