directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörg Henne (JIRA) <j...@apache.org>
Subject [jira] Commented: (DIRSERVER-586) Reliable hang of DS during query
Date Thu, 03 Aug 2006 10:49:15 GMT
    [ http://issues.apache.org/jira/browse/DIRSERVER-586?page=comments#action_12425489 ] 
            
Jörg Henne commented on DIRSERVER-586:
--------------------------------------

Thanks for your continued feedback, Emmanuel!

I'll answer your points one-by-one:

1) Even other threads within my test case can continue their work undisturbed. Connections
from other sources are also not problem at all. The symptom is simply that some connections
seem to just go dead. 
To give you an idea of how many are affected: I usually run the tes with 10 threads, each
executing about 200 interactions with the server (100 object creations, 100 deletions). Of
those 10 threads usually about 1-3 run into the hang.
As stated earlier: when a connection runs into the hung state, this causes the corresponding
channel to not be returned from Selector.select() calls. My earlier observation, that the
channel is completely lost from the selector's channel list was bunk, btw. It is still there,
but simply not selected. This may very well be a problem with the runtime libraries or even
the LDAP client, BTW.

2) Good idea, but still: hangs as before.

3) 
JRockit: hey, I've wanted to try this for a long time. Now it's time to do so.
Test 1: Server on JRockit, client (unit test) on SUN: still hangs.
Test 2: Server on JRockit, client on JRockit: no hang. What was that? Several tries: IT! DOESN'T!
HANG! wow.
Test 3: Server in SUN, client on JRockit: still not hang.
Interesting.

IBM JVM:
Test 1: server on SUN client on IBM: hang!
Test 2: server on IBM, client on IBM: hang!

Observation on the side: the test runs 3-4 times slower on IBM and SUN JVMs (even though some
Threads don't even make it to the end due to a hang!) compared to JRockit. The effect on the
server side seems to be far less pronounced, which might me due to the log output in the client
side.

While we're at it, some completely unscientific benchmarks. The client is always on JRockit
(since it is the only way the client always makes it to the end, it doesn't make sense to
compare using other JVMs for the client) and run multiple times to allow for some JIT burn-in:
- SUN 1.5.0_07: ~5000ms per test run.
- IBM 1.5: ~5700ms
- JRockit: ~4000ms (OMG!)

4) The thread dump is not a problem. I have both client and server running under full debugger
control and can plainly see what all the threads are doing. A TCP capture would be very, very
interesting, but I don't know how I can capture traffic which doesn't actually cross a physical
network interface.

5) Unfortunately, I don't have any chickens at hand (lucky them!), but to draw some conclusion:
one possible explanation would be that the problems are caused by the different IO libraries
used by the different JVMs (see thread dumps below). The cause could also be a problem on
the server side which is triggered by certain timing differences between the client JVMs.
However, I think the former seems more likely to me, because the hangs don't seem to be influenced
by the client timing itself. In fact, I first got the hangs using my OLM (object to LDAP mapping)
framework which surely has very, very different timing characteristics.

Here's a stack dump of a JRockit reader thread:
Thread [Thread-4] (Suspended)
	owns: java.io.BufferedInputStream  (id=51)
	jrockit.net.SocketNativeIO.readBytesPinned(int, byte[], int, int, int) line: not available
[native method]
	jrockit.net.SocketNativeIO.socketRead(java.io.FileDescriptor, byte[], int, int, int) line:
not available
	java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int) line:
not available
	java.net.SocketInputStream.read(byte[], int, int) line: 129
	java.io.BufferedInputStream.fill() line: 218
	java.io.BufferedInputStream.read1(byte[], int, int) line: 256
	java.io.BufferedInputStream.read(byte[], int, int) line: 313
	com.sun.jndi.ldap.Connection.run() line: 784
	java.lang.Thread.run() line: not available

This is from the IBM JVM:
Thread [Thread-17] (Suspended)
	owns: java.io.BufferedInputStream  (id=45)
	java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int) line:
not available [native method]
	java.net.SocketInputStream.read(byte[], int, int) line: 155
	java.io.BufferedInputStream.fill() line: 229
	java.io.BufferedInputStream.read1(byte[], int, int) line: 267
	java.io.BufferedInputStream.read(byte[], int, int) line: 324
	com.sun.jndi.ldap.Connection.run() line: 814
	java.lang.Thread.run() line: 788

And this is, finally, the SUN JVM:
Thread [Thread-31] (Suspended)
	owns: java.io.BufferedInputStream  (id=60)
	java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int) line:
not available [native method]
	java.net.SocketInputStream.read(byte[], int, int) line: 129
	java.io.BufferedInputStream.fill() line: 218
	java.io.BufferedInputStream.read1(byte[], int, int) line: 256
	java.io.BufferedInputStream.read(byte[], int, int) line: 313
	com.sun.jndi.ldap.Connection.run() line: 784
	java.lang.Thread.run() line: 595


I'm not saying that this specific class is the culprit - it is rather the write-side of the
communication which is the problem, but the stack dumps indicate that JRockit has very different
socket-IO code compared to SUN/IBM. Wild guess: IBM licensed 

> Reliable hang of DS during query
> --------------------------------
>
>                 Key: DIRSERVER-586
>                 URL: http://issues.apache.org/jira/browse/DIRSERVER-586
>             Project: Directory ApacheDS
>          Issue Type: Bug
>         Environment: DS 0.9.3, Windows, JDK 1.5
>            Reporter: Jörg Henne
>         Assigned To: Alex Karasulu
>         Attachments: bugreport.zip, TestHang.java
>
>
> When running the attached test, the directory server hangs after executing a slew of
operations when searching for objects.
> First of all, some background on the test case:
> The attached test case (in the form of an exported eclipse project) is, unfortunately,
based on quite a few classes. They are part of a project I am currently working on: an object
to ldap mapper with a similar approach as castor for XML or hibernate for RDBMS, albeit a
lot more modest in complexity (I'll, hopefully, one day be able to open-source it - for now
it is still much to immature). I have supplied all that stuff mainly for your reference.
> To run the test case, please make sure that the constant "URL" in LDAPDirectoryTest points
to a valid directory. The URL the context points to must exist. It will, however, subsequently
create lots of nodes below it.
> The hang seems to be related to some kind of deadlock, since it doesn't occur once the
whole test is run via a single context only. To achieve this, set the constant "ONE_CONTEXT"
to true (each LDAPDirectory uses its own set of contexts).
> If you have any problems running the test, please don't hesitate to contact me.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message