directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <akaras...@apache.org>
Subject Re: Problem when running server-integ tests
Date Mon, 21 Feb 2011 10:37:50 GMT
On Mon, Feb 21, 2011 at 11:24 AM, Emmanuel Lecharny <elecharny@gmail.com>wrote:

> Hi guys,
>
> this week-end, I experienced some bad issues with the server-integ tests,
> while trying to use the same LdapDecoder for both client and server. After 3
> or 4 tests, I get blocked for 30 seconds, and then move to the next test,
> get blocked again, etc.
>
> Of course, at the end, I got test failures, as expected.
>
> I did some deep analysis on what was going on, added some traces, and that
> led me to a point where I get blocked while trying to access to any of the
> LdapContainer method.
>
> Here is an exemple of the plcae I get blocked in :
>
>    public void decode( ByteBuffer stream, Asn1Container container ) throws
> DecoderException
>    {
>        boolean hasRemaining = stream.hasRemaining();
>        System.out.println( "<Has Remaining> : " + hasRemaining + "/" +
> stream.remaining() );
>
>        // Increment the PDU size counter.
>        int remaining = stream.remaining();
>        System.out.println( "----- 0" );
>        container.incrementDecodeBytes( remaining );
>
>        System.out.println( "----- 1" );
>
> and in the console, the "----- 0" get printed, not the "----- 1". The
> incrementDecodeBytes is a very simple method :
>
>    public void incrementDecodeBytes( int nb )
>    {
>        System.out.println( "---in incrementDecodedBytes" );
>        decodeBytes += nb;
>    }
>
> (note that I don't get either the "---in incrementDecodedBytes" message).
> It's not synchronized, not the decodeBytes is an AtomicInteger.
>
> I did a threadDump, which is available here : http://pastebin.com/wPNPL1FY
>
> It shows that 15 threads are blocked on monitors :
>
> pool-24-thread-1        Task    5       51326976        5680140288
>  waiting on condition [152900000]
> pool-22-thread-1        Task    5       50612224        5684236288
>  waiting on condition [152ce8000]
> pool-14-thread-4        Task    5       56862720        5715816448
>  waiting on condition [154b06000]
> net.sf.ehcache.CacheManager@6f7cf6b6    Daemon  5       27879424
>  5657616384 in Object.wait()     [151385000]
> main    Task    5       50333696        4300214272      waiting on
> condition    [1004fe000]
> kdcReplayCache.data     Daemon  5       50774016        5658677248
>  waiting on condition    [151488000]
> groupCache.data         Daemon  5       50778112        5660798976
>  waiting on condition [15168e000]
> changePwdReplayCache.data       Daemon  5       27881472        5659738112
>      waiting on condition    [15158b000]
> SynchWorkerThread       Task    5       35258368        5652594688      in
> Object.wait() [150ebb000]
> Surefire fork timeout timer     Task    5       26988544        5647831040
>      in Object.wait()        [150a30000]
> Reference Handler       Daemon  10      50362368        5634641920      in
> Object.wait() [14fd9c000]
> Finalizer       Daemon  8       50364416        5635702784      in
> Object.wait()        [14fe9f000]
> FelixStartLevel         Daemon  5       35293184        5670658048      in
> Object.wait() [151ff5000]
> FelixPackageAdmin       Daemon  5       33517568        5655359488      in
> Object.wait() [15115e000]
> FelixDispatchQueue      Task    5       28618752        5661859840      in
> Object.wait() [151791000]
>
>
> I have *no idea* why the test get blocked, I see no rational reason for it
> to be blocked. I'll add the thread-id n the logs to see which thread it is
> running on...
>
> Can felix be an issue here ?


OK let's just list what we know:

(1) These problems were not present until we experimented with changes to
the codec.
(2) A small part of codec code may be running inside Felix, but only if
we're hitting some of these extra controls. The codec core is not inside
Felix.

It's probably unlikely that Felix is the culprit but who knows: are there
specific tests that are showing this behavior or is it all of them? This
behavior is weird so we should suspect everything. Can you post the stack
traces from the thread dump?

Thanks,
Alex

Mime
View raw message