apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rainer Jung <rainer.j...@kippdata.de>
Subject Re: [vote] Release apr-util 1.3.10
Date Sat, 02 Oct 2010 11:23:49 GMT
On 01.10.2010 15:22, Jeff Trawick wrote:
> Tarballs are at http://apr.apache.org/dev/dist/.  Windows packages are
> not yet available.
>
> Due to the inclusion of a fix for a potential DOS that could affect
> some library consumers, I hope to get enough feedback within 24 hours
> to release.
>
>   +/-1
>   [+1]  Release apr-util 1.3.10 as GA

I built and made check on the following platforms:

- Solaris 8 + 10, Sparc
- SuSE Linux Enterprise 10 32 and 64 Bit
- RedHat Enterprise Linux 5, 64 Bit

Using all combinations of:

apr 1.3.12 / 1.4.2
expat builtin / 2.0.1
dso disable / enable
Berkeley DB 4.8.30 5.0.26 5.1.19 (using a patch for that one)
sqlite 3.7.2
mysql 6.0.2 (only Solaris)
oracle 10.2.0.4.0 (only Solaris)

All builds suceeded, all make check ran fine, except for two cases on 
Solaris 10 (Niagara). I reran the tests there and couldn't reproduce the 
problem. Tests now running in a loop, so far not reproducible.

Differences to 1.3.9:

CHANGES file is now truncated, i.e. CHANGES from earlier branches are 
removed.

Possible improvements:

- Maybe also provide sha1 hashes for the download files

- Minor glitch for out of tree builds
   (no regression, fix already committed)

- Support Berkeley DB 5.1 (will do)

- Add notes to build info, that the apr source used
   during apu buildconf should contain recent config.(guess|sub)

I also built subversion 1.6.13 and httpd 2.3.8 against this apu.
Builds suceeded, httpd tests OK, subversion tests still running (50% 
done, OK).

Details on Solaris 10 test failures

- both in testreslist
- retried both tests more than 100 times, could not reproduce
- build against apr 1.4.2

1) Bus Error
============

pstack shows me

core 'core.testall.4858' of 4858:       ./testall
-----------------  lwp# 1 / thread# 1  --------------------
  ff045578 __pollsys (ffbfefb0, 0, ffbff018, 0, 0, 0) + 8
  fefe6ad4 pselect  (ffbfefb0, ff0723d0, ff0723d0, 0, ffbff018, 0) + 1c8
  fefe6e4c select   (0, 0, 0, 0, ffbff080, 18850) + a0
  ff2f1c94 apr_sleep (0, 2710, 0, 38820, 38820, 0) + 4c
  0001886c my_destructor (0, 387c8, 385a0, 38828, 38820, 18850) + 1c
  ff3744b4 destroy_resource (387f8, 38820, 0, 38820, 38820, 38787) + 10
  ff3746a4 reslist_maint (387f8, 38470, 38470, 38828, 38820, 18850) + d8
  ff374880 apr_reslist_create (0, 3, a, 14, 0, 88b8) + 108
  00018a58 test_reslist (2fa60, 0, 98948, 18d44, 0, 2710) + c8
  0001382c abts_run_test (2fa60, 18990, 0, 2f160, fec30200, fec30240) + 48
  00018d44 testreslist (2f850, 1b528, 0, 0, 0, 18d10) + 34
  00013fb4 main     (1, 2ebac, ffbff7b4, 2f160, fec30200, fec30240) + 11c
  00013260 _start   (0, 0, 0, 0, 0, 0) + 5c

All other threads are zombies (dummy_worker) or in

ff04484c __lwp_park (38558, 38528, 0, 0, 1, 0) + 14

except the following thread:

-----------------  lwp# 37 / thread# 37  --------------------
  ff371464 thread_pool_func (38768, 384b0, 0, 0, 0, 0) + 34
  ff2efea0 dummy_worker (38768, fe37c000, 0, 0, ff2efe94, 1) + c
  ff0447a8 _lwp_start (0, 0, 0, 0, 0, 0)

GDB says:

Thread 4 (process 70394    ):
#0  0xff045578 in __pollsys () from /lib/libc.so.1
No symbol table info available.
#1  0xff0386a8 in _pollsys () from /lib/libc.so.1
No symbol table info available.
#2  0xfefe6adc in pselect () from /lib/libc.so.1
No symbol table info available.
#3  0xfefe6e54 in select () from /lib/libc.so.1
No symbol table info available.
#4  0xff2f1c9c in apr_sleep (t=10000) at time/unix/time.c:246
         tv = {tv_sec = 0, tv_usec = 10000}
#5  0x00018874 in my_destructor (resource=0x0, params=0x387c8, 
pool=0x385a0) at 
/shared/build/dev/httpd/sources/apr-util/1.3/1.3.10/apr-util-1.3.10/test/testreslist.c:89
No locals.
#6  0xff3744bc in destroy_resource (reslist=0x387f8, res=0x38820) at 
/shared/build/dev/httpd/sources/apr-util/1.3/1.3.10/apr-util-1.3.10/misc/apr_reslist.c:135
No locals.
...

So this one seems innocent, but thread 37:

#0  0xff371464 in thread_pool_func (t=0x38768, param=0x384b0) at 
/shared/build/dev/httpd/sources/apr-util/1.3/1.3.10/apr-util-1.3.10/misc/apr_thread_pool.c:220
220             APR_RING_REMOVE(elt, link);
(gdb) bt full
#0  0xff371464 in thread_pool_func (t=0x38768, param=0x384b0) at 
/shared/build/dev/httpd/sources/apr-util/1.3/1.3.10/apr-util-1.3.10/misc/apr_thread_pool.c:220
         task = (apr_thread_pool_task_t *) 0x0
         wait = 990831775318016
#1  0xff2efea8 in dummy_worker (opaque=0x38768) at 
threadproc/unix/thread.c:142
No locals.
#2  0xff0447b0 in _lwp_start () from /lib/libc.so.1
No symbol table info available.
#3  0xff0447b0 in _lwp_start () from /lib/libc.so.1
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

could be the culprit.

That seems to be more an apr issue though.


2) Endless loop
===============

-----------------  lwp# 1 / thread# 1  --------------------
  ff2e5788 apr_pool_cleanup_kill (38438, 38478, ff372608, f9, 1, 38ca0) + 34
  00018ab8 ???????? (2fa28, 0, 98948, 18d10, 0, 3a98)
  000137f8 abts_run_test (2fa28, 1895c, 0, 2f128, fec30200, fec30240) + 48
  00018d10 testreslist (2f818, 1b4f0, 0, 0, 0, 18cdc) + 34
  00013f80 main     (1, 2eb74, ffbff794, 2f128, fec30200, fec30240) + 11c
  0001322c _start   (0, 0, 0, 0, 0, 0) + 5c

all other threads in lwp_park().

GDB:

Loops in pool cleanup, two snapshots:

[Switching to Thread 1 (LWP 1)]
0xff2e5790 in apr_pool_cleanup_kill (p=0x38438, data=0x38478, 
cleanup_fn=0xff372608 <thread_pool_cleanup>) at memory/unix/apr_pools.c:2241
2241            c = c->next;
(gdb) bt full
#0  0xff2e5790 in apr_pool_cleanup_kill (p=0x38438, data=0x38478, 
cleanup_fn=0xff372608 <thread_pool_cleanup>) at memory/unix/apr_pools.c:2241
         c = (cleanup_t *) 0x38558
         lastp = (cleanup_t **) 0x38558
#1  0xff2e5890 in apr_pool_cleanup_run (p=0x38438, data=0x38478, 
cleanup_fn=0xff372608 <thread_pool_cleanup>) at memory/unix/apr_pools.c:2298
No locals.
#2  0x00018ac0 in test_reslist (tc=0x2fa28, data=0x0) at 
/shared/build/dev/httpd/sources/apr-util/1.3/1.3.10/apr-util-1.3.10/test/testreslist.c:252
         i = 25
         rv = -13163000
         rl = (apr_reslist_t *) 0x38820
         params = (my_parameters_t *) 0x387f0
         thrp = (apr_thread_pool_t *) 0x38478
         thread_info = {{tid = 0, tc = 0x2fa28, reslist = 0x38820, 
work_delay_sleep = 15000}, {tid = 1, tc = 0x2fa28, reslist = 0x38820, 
work_delay_sleep = 15000}, {
     tid = 2, tc = 0x2fa28, reslist = 0x38820, work_delay_sleep = 
15000}, {tid = 3, tc = 0x2fa28, reslist = 0x38820, work_delay_sleep = 
15000}, {tid = 4, tc = 0x2fa28,
     reslist = 0x38820, work_delay_sleep = 15000}, {tid = 5, tc = 
0x2fa28, reslist = 0x38820, work_delay_sleep = 15000}, {tid = 6, tc = 
0x2fa28, reslist = 0x38820,
     work_delay_sleep = 15000}, {tid = 7, tc = 0x2fa28, reslist = 
0x38820, work_delay_sleep = 15000}, {tid = 8, tc = 0x2fa28, reslist = 
0x38820, work_delay_sleep = 15000},
   {tid = 9, tc = 0x2fa28, reslist = 0x38820, work_delay_sleep = 15000}, 
{tid = 10, tc = 0x2fa28, reslist = 0x38820, work_delay_sleep = 15000}, 
{tid = 11, tc = 0x2fa28,
     reslist = 0x38820, work_delay_sleep = 15000}, {tid = 12, tc = 
0x2fa28, reslist = 0x38820, work_delay_sleep = 15000}, {tid = 13, tc = 
0x2fa28, reslist = 0x38820,
     work_delay_sleep = 15000}, {tid = 14, tc = 0x2fa28, reslist = 
0x38820, work_delay_sleep = 15000}, {tid = 15, tc = 0x2fa28, reslist = 
0x38820,
     work_delay_sleep = 15000}, {tid = 16, tc = 0x2fa28, reslist = 
0x38820, work_delay_sleep = 15000}, {tid = 17, tc = 0x2fa28, reslist = 
0x38820,
     work_delay_sleep = 15000}, {tid = 18, tc = 0x2fa28, reslist = 
0x38820, work_delay_sleep = 15000}, {tid = 19, tc = 0x2fa28, reslist = 
0x38820,
     work_delay_sleep = 15000}, {tid = 20, tc = 0x2fa28, reslist = 
0x38820, work_delay_sleep = 15000}, {tid = 21, tc = 0x2fa28, reslist = 
0x38820,
     work_delay_sleep = 15000}, {tid = 22, tc = 0x2fa28, reslist = 
0x38820, work_delay_sleep = 15000}, {tid = 23, tc = 0x2fa28, reslist = 
0x38820,
     work_delay_sleep = 15000}, {tid = 24, tc = 0x2fa28, reslist = 
0x38820, work_delay_sleep = 15000}}
#3  0x00013800 in abts_run_test (ts=0x2fa28, f=0x1895c <test_reslist>, 
value=0x0) at 
/shared/build/dev/httpd/sources/apr-util/1.3/1.3.10/apr-util-1.3.10/test/abts.c:169
         tc = (abts_case *) 0x2fa28
         ss = (sub_suite *) 0x31268
#4  0x00018d18 in testreslist (suite=0x2f818) at 
/shared/build/dev/httpd/sources/apr-util/1.3/1.3.10/apr-util-1.3.10/test/testreslist.c:271
No locals.
#5  0x00013f88 in main (argc=1, argv=0x2eb74) at 
/shared/build/dev/httpd/sources/apr-util/1.3/1.3.10/apr-util-1.3.10/test/abts.c:411
         i = 17
         list_provided = 0
         suite = (abts_suite *) 0x2f818


and


[Switching to Thread 1 (LWP 1)]
apr_pool_cleanup_kill (p=0x38438, data=0x38478, cleanup_fn=0xff372608 
<thread_pool_cleanup>) at memory/unix/apr_pools.c:2240
2240            lastp = &c->next;
(gdb) bt full
#0  apr_pool_cleanup_kill (p=0x38438, data=0x38478, 
cleanup_fn=0xff372608 <thread_pool_cleanup>) at memory/unix/apr_pools.c:2240
         c = (cleanup_t *) 0x38558
         lastp = (cleanup_t **) 0x38558
#1  0xff2e5890 in apr_pool_cleanup_run (p=0x38438, data=0x38478, 
cleanup_fn=0xff372608 <thread_pool_cleanup>) at memory/unix/apr_pools.c:2298
No locals.
...

and we have:

(gdb) print c
$1 = (cleanup_t *) 0x38558
(gdb) print *c
$2 = {next = 0x38558, data = 0x38558, plain_cleanup_fn = 0x38710, 
child_cleanup_fn = 0x38798}

so c == c->next and we will loop endlessly.

Mime
View raw message