apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rainer Jung <rainer.j...@kippdata.de>
Subject Re: [Vote] Release apr-util 1.3.11
Date Thu, 28 Apr 2011 09:24:42 GMT
Hi Stefan,

On 28.04.2011 00:34, Stefan Fritsch wrote:
> On Tuesday 26 April 2011, Rainer Jung wrote:
>> +1 although there are still two problems on Solaris 10 for
>> test_reslist, but not a regression.
>>
>> I built and made check on the following platforms:
>>
>> - Solaris 8 + 10, Sparc
>> - SuSE Linux Enterprise 10 32 and 64 Bit
>> - RedHat Enterprise Linux 5, 64 Bit
>>
>> Using all combinations of:
>>
>> apr 1.3.12 / 1.4.2
>> expat builtin / 2.0.1
>> dso disable / enable
>> Berkeley DB 4.8.30 5.0.26 5.1.19
>> sqlite 3.7.2
>> mysql 6.0.2 (only Solaris)
>> oracle 10.2.0.4.0 (only Solaris)
>>
>> All builds suceeded, all make check ran fine, except for two cases
>> on Solaris 10 (this time not Niagara, but instead old sun4u - V240
>> with 2 CPUs).
>>
>> I reran the tests and couldn't reproduce the problem, so it is not
>> deterministic. Out of 48 build combinations on Solaris 10, only
>> three had a problem. This is similar to 1.3.10, but it is not
>> always the same combinations. Like for 1.3.10 problem happens on
>> Solaris 10 but not on Solaris 8.
>>
>> Details on Solaris 10 test failures
>>
>> - only in testreslist
>> - two types of failures:
>>     - twice crashes (segmentation fault)
>>     - once non-terminating loop
>> - Crashes seem not really related to used apr version (one for 1.3
>> and one for 1.4)
>
> I also get undeterministic test failures on the Debian build machines,
> mostly hangs in testreslist. It happens on mipsel and sparc much more
> often than on the other architectures, and some architectures had no
> failure at all. Which compiler are you using? If you are using gcc, it
> could be a gcc bug.

On Sparc I use gcc 4.1.2. All builds are 32 Bit.

Concerning the hangs (unterminated loops in my case), I did some more 
investigation for 1.3.10 and confirmed using GDB, that there actually 
was a cycle in the cleanups:

(gdb) print c
$1 = (cleanup_t *) 0x38558
(gdb) print *c
$2 = {next = 0x38558, data = 0x38558, plain_cleanup_fn = 0x38710, 
child_cleanup_fn = 0x38798}

so c == c->next and thus apr_pool_cleanup_kill looped.

I didn't check, whether that was still true for 1.3.11. I don't know why 
c == c->next.

Concerning gcc: I use the same gcc for building on Solaris 8 and on 
Solaris 10, even the same binary gcc files. I never observed a problem 
on the single CPU Sparc 8 system, but did observer problems on Solaris 
10 for 1.3.10 and for 1.3.11. Apart from the OS version the other major 
difference is concurrency in hardware (used Niagara CPU with 6 or 8 
cores and 4 times the number of strands when testing 1.3.10, and a more 
traditional 2 CPU Sparc V240 when testing 1.3.11).

I hope I have some time to check older versions, like 1.3.9 etc. and 
maybe also older apr (pool) versions to see, whether I can narrow down 
the reason. Unfortunately until now, I could only reproduce the two 
problems (unterminated loop, crash) when doing the testing as part of 
the mass building, which takes time (a couple of hours). When running 
testall after building even in loops, I could not reproduce the problems ...

Regards,

Rainer

Mime
View raw message