Return-Path: X-Original-To: apmail-apr-dev-archive@www.apache.org Delivered-To: apmail-apr-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CC1B63DE2 for ; Thu, 28 Apr 2011 09:25:11 +0000 (UTC) Received: (qmail 4146 invoked by uid 500); 28 Apr 2011 09:25:11 -0000 Delivered-To: apmail-apr-dev-archive@apr.apache.org Received: (qmail 4059 invoked by uid 500); 28 Apr 2011 09:25:11 -0000 Mailing-List: contact dev-help@apr.apache.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Id: Delivered-To: mailing list dev@apr.apache.org Received: (qmail 4051 invoked by uid 99); 28 Apr 2011 09:25:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Apr 2011 09:25:11 +0000 X-ASF-Spam-Status: No, hits=-1.1 required=5.0 tests=MISSING_HEADERS,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rainer.jung@kippdata.de designates 195.227.30.149 as permitted sender) Received: from [195.227.30.149] (HELO mailserver.kippdata.de) (195.227.30.149) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Apr 2011 09:25:05 +0000 Received: from [172.28.30.42] ([172.28.30.42]) by mailserver.kippdata.de (8.13.5/8.13.5) with ESMTP id p3S9OgD1013160 for ; Thu, 28 Apr 2011 11:24:42 +0200 (CEST) Message-ID: <4DB9325A.7060605@kippdata.de> Date: Thu, 28 Apr 2011 11:24:42 +0200 From: Rainer Jung User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9 MIME-Version: 1.0 CC: dev@apr.apache.org Subject: Re: [Vote] Release apr-util 1.3.11 References: <4DB71FF3.80109@kippdata.de> <201104280034.29694.sf@sfritsch.de> In-Reply-To: <201104280034.29694.sf@sfritsch.de> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Hi Stefan, On 28.04.2011 00:34, Stefan Fritsch wrote: > On Tuesday 26 April 2011, Rainer Jung wrote: >> +1 although there are still two problems on Solaris 10 for >> test_reslist, but not a regression. >> >> I built and made check on the following platforms: >> >> - Solaris 8 + 10, Sparc >> - SuSE Linux Enterprise 10 32 and 64 Bit >> - RedHat Enterprise Linux 5, 64 Bit >> >> Using all combinations of: >> >> apr 1.3.12 / 1.4.2 >> expat builtin / 2.0.1 >> dso disable / enable >> Berkeley DB 4.8.30 5.0.26 5.1.19 >> sqlite 3.7.2 >> mysql 6.0.2 (only Solaris) >> oracle 10.2.0.4.0 (only Solaris) >> >> All builds suceeded, all make check ran fine, except for two cases >> on Solaris 10 (this time not Niagara, but instead old sun4u - V240 >> with 2 CPUs). >> >> I reran the tests and couldn't reproduce the problem, so it is not >> deterministic. Out of 48 build combinations on Solaris 10, only >> three had a problem. This is similar to 1.3.10, but it is not >> always the same combinations. Like for 1.3.10 problem happens on >> Solaris 10 but not on Solaris 8. >> >> Details on Solaris 10 test failures >> >> - only in testreslist >> - two types of failures: >> - twice crashes (segmentation fault) >> - once non-terminating loop >> - Crashes seem not really related to used apr version (one for 1.3 >> and one for 1.4) > > I also get undeterministic test failures on the Debian build machines, > mostly hangs in testreslist. It happens on mipsel and sparc much more > often than on the other architectures, and some architectures had no > failure at all. Which compiler are you using? If you are using gcc, it > could be a gcc bug. On Sparc I use gcc 4.1.2. All builds are 32 Bit. Concerning the hangs (unterminated loops in my case), I did some more investigation for 1.3.10 and confirmed using GDB, that there actually was a cycle in the cleanups: (gdb) print c $1 = (cleanup_t *) 0x38558 (gdb) print *c $2 = {next = 0x38558, data = 0x38558, plain_cleanup_fn = 0x38710, child_cleanup_fn = 0x38798} so c == c->next and thus apr_pool_cleanup_kill looped. I didn't check, whether that was still true for 1.3.11. I don't know why c == c->next. Concerning gcc: I use the same gcc for building on Solaris 8 and on Solaris 10, even the same binary gcc files. I never observed a problem on the single CPU Sparc 8 system, but did observer problems on Solaris 10 for 1.3.10 and for 1.3.11. Apart from the OS version the other major difference is concurrency in hardware (used Niagara CPU with 6 or 8 cores and 4 times the number of strands when testing 1.3.10, and a more traditional 2 CPU Sparc V240 when testing 1.3.11). I hope I have some time to check older versions, like 1.3.9 etc. and maybe also older apr (pool) versions to see, whether I can narrow down the reason. Unfortunately until now, I could only reproduce the two problems (unterminated loop, crash) when doing the testing as part of the mass building, which takes time (a couple of hours). When running testall after building even in loops, I could not reproduce the problems ... Regards, Rainer