db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kathey Marsden (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-4319) hang in suites.all with ibm 1.5 on AIX after ttestDefaultProperties
Date Thu, 10 Mar 2011 21:04:59 GMT

    [ https://issues.apache.org/jira/browse/DERBY-4319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005332#comment-13005332

Kathey Marsden commented on DERBY-4319:

I have the machine back in the state where this reproduces and am sorry to say that there
is still a hang in a different method, even with my prior attempt to get past it, but since
I can reproduce now, I should be able to make some progress on this issue.    I'll record
some info here in case it becomes hard to reproduce again.

The current state of hang is that the launched network server process which seems to specify
all the drda parameters without values:
cloudtst 6488248 4390978   0 14:41:38      -  0:20 /local1/IBM_JDK/15sr13/sdk/jr
e/bin/java -classpath /local1/kmarsden/repro/derby-4319/jars//derby.jar:/local1/
ars//derbyTesting.jar:/local1/kmarsden/repro/derby-4319/jars//junit.jar -Dderby.
drda.logConnections= -Dderby.drda.traceAll= -Dderby.drda.traceDirectory= -Dderby
.drda.keepAlive= -Dderby.drda.timeSlice= -Dderby.drda.host= -Dderby.drda.portNum
ber= -Dderby.drda.minThreads= -Dderby.drda.maxThreads= -Dderby.drda.startNetwork
Server= -Dderby.drda.debug= org.apache.derby.drda.NetworkServerControl start -h
localhost -p 1527

I will attach the javacore with thread dump as LaunchedNetworkServer.javacore.20110309.160148.6488248.0001.txt

The server threads look pretty normal with a ClientThread running waiting to accept requests.

The test process is hung in NetworkServerTestSetup.complete(). I am not sure if it is later
or if the change I made just did not work.  I will attach the test process file as:

If I try to ping the server from the command line I get a ConnectionReset error:
$ java org.apache.derby.drda.NetworkServerControl ping
Thu Mar 10 12:47:39 PST 2011 : Error on client socket:
 Connection reset
Thu Mar 10 12:47:39 PST 2011 : Connection reset
java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:197)
        at java.net.SocketInputStream.read(SocketInputStream.java:116)
        at org.apache.derby.impl.drda.NetworkServerControlImpl.fillReplyBuffer(N
        at org.apache.derby.impl.drda.NetworkServerControlImpl.readResult(Networ
        at org.apache.derby.impl.drda.NetworkServerControlImpl.pingWithNoOpen(Ne
        at org.apache.derby.impl.drda.NetworkServerControlImpl.ping(NetworkServe
        at org.apache.derby.impl.drda.NetworkServerControlImpl.executeWork(Netwo
        at org.apache.derby.drda.NetworkServerControl.main(NetworkServerControl.

Then after that subsequent ping attempts hang and a new thread dump on the Network Server
process shows that the ClientThread is no longer there.   I think this should never happen.
I think a lot of work has been put into making sure that the ClientThread always survives
any type of error in order host more connections. see attachment  LaunchedNetworkServerAfterPing.javacore.20110310.124948.6488248.0002.txt

Another thing to note is that prior to the defaultProperties test there was actually a stack
trace in the setPortPriorty test with a Connection reset  which did not cause failure. see
TestOutput2011-03-09.txt .out

This issue actually has many facets that are worth working on:

1) How do we make sure a spawned network server process is destroyed if it  hangs the whole

2)  Under  what circumstances can the Network Server ClientThread that loops accepting new
connections be destroyed?

3) What sort of problem is being caused on AIX by starting network server with these odd options?
 I am thinking maybe it is related to soTimeout or keepalive getting set to an unexpected
option but am not sure.

I have been holding off on working on 3, because it provides a good reproduction for issue
one and two but think that at this point, the best thing to do would be to disable the problematic
fixture on AIX whether it is testSetpPortPriority or testDefaultProperties.    Then I can
work on all three issues in a logical order and pace without release concerns.  I'll look
into doing that.

> hang in suites.all with ibm 1.5 on AIX after ttestDefaultProperties
> -------------------------------------------------------------------
>                 Key: DERBY-4319
>                 URL: https://issues.apache.org/jira/browse/DERBY-4319
>             Project: Derby
>          Issue Type: Bug
>          Components: Network Client
>    Affects Versions:
>         Environment: ibm jvm 1.5 SR9-0 on IBM AIX 3.5
>            Reporter: Myrna van Lunteren
>            Assignee: Kathey Marsden
>              Labels: derby_triage10_8
>         Attachments: derby-4317_timeout_for_complete_diff.txt, derby-4319_teardown_kill_on_bad_ping.txt,
javacore.20090723.093837.25380.0001.txt, javacore.20090723.093909.24726.0001.txt
> The test run for hung in suites.All. The console output (the run was with -Dderby.tests.trace=true)
showed ttestDefaultProperties had successfully completed but the run was halted.
> ps -eaf | grep java showed the process that kicked off suites.All, and a networkserver
process with the following flags:
> - classpath <classpath including derby.jar, derbytools.jar, derbyclient.jar, derbynet.jar,
derbyTesting.jar, derbyrun.jar, derbyTesting.jar and junit.jar> -Dderby.drda.logConnections=
-Dderby.drda.traceAll= -Dderby.drda.traceDirectory= -Dderby.drda.keepAlive= -Dderby.drda.timeSlice=
-Dderby.drda.host= -Dderby.drda.portNumber= -derby.drda.minThreads= -Dderby.drda.maxThreads=
-Dderby.drda.startNetworkServer= -Dderby.drda.debug= org.apache.derby.drda.NetworkServerControl
start -h localhost -p 1527
> This process had been sitting for 2 days.
> After killing the NetworkServerControl process, the test continued successfully (except
for DERBY-4186, fixed in trunk), but the following was put out to the console:
>  START-SPAWNED:SpawnedNetworkServer STANDARD OUTPUT: exit code=137
> 2009-07-18 03:16:07.157 GMT : Security manager installed using the Basic server
> security policy.
> 2009-07-18 03:16:09.169 GMT : Apache Derby Network Server - - (794445)
> started and ready to accept connections on port 1527

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message