www-apache-bugdb mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billy Halsey <Billy.Hal...@Sun.COM>
Subject os-solaris/9785: When ap_extended_status == 1, parent process dies with SIGBUS
Date Sun, 10 Feb 2002 11:53:39 GMT

>Number:         9785
>Category:       os-solaris
>Synopsis:       When ap_extended_status == 1, parent process dies with SIGBUS
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    apache
>State:          open
>Class:          sw-bug
>Submitter-Id:   apache
>Arrival-Date:   Sun Feb 10 04:00:01 PST 2002
>Originator:     Billy.Halsey@Sun.COM
>Release:        2.0.32-dev
uname -a: SunOS trinity 5.8 Generic_108528-12 sun4u sparc SUNW,Ultra-5_10
compiler: gcc-2.95.3 (also exhibited with Sun Workshop 6 update 1 release 1)

>From error_log:
[Sun Feb 10 02:22:57 2002] [notice] seg fault or similar nasty error detected in the parent

It appears that the 'ws' variable in ap_update_child_status_from_indexes() is getting initialized
with a non-aligned value.

Stack backtrace:

[bhalsey@trinity 762]$ dbx httpd core.httpd.17675 
Reading symbolic information for httpd
core file header read successfully
Reading symbolic information for rtld /usr/lib/ld.so.1
dbx: program is not active
Reading symbolic information for libaprutil.so.0
Reading symbolic information for libapr.so.0
Reading symbolic information for libsendfile.so.1
Reading symbolic information for libm.so.1
Reading symbolic information for libsocket.so.1
Reading symbolic information for libnsl.so.1
Reading symbolic information for libdl.so.1
Reading symbolic information for libz.so.1
Reading symbolic information for libexpat.so.0
Reading symbolic information for libpthread.so.1
Reading symbolic information for libc.so.1
Reading symbolic information for libmp.so.2
Reading symbolic information for libc_psr.so.1
Reading symbolic information for libthread.so.1
detected a multithreaded program
dbx: program is not active
t@1 (l@1) terminated by signal BUS (Bus Error)
(dbx) where
current thread: t@1
=>[1] ap_update_child_status_from_indexes(0x0, 0x0, 0x1, 0x0, 0x10, 0xffbef8d0), at 0x77c38
  [2] make_child(0x16c388, 0x0, 0x0, 0x0, 0x10, 0xc2308), at 0x69914
  [3] startup_children(0x5, 0xb7c00, 0xb6400, 0x4, 0x1, 0x0), at 0x69a34
  [4] ap_mpm_run(0x5, 0x10a428, 0x16c388, 0xb6800, 0xb6400, 0x0), at 0x69d48
  [5] main(0xc0380, 0xc2308, 0x0, 0xb6800, 0xffbefa04, 0xb5800), at 0x6f620

Looking at the disassembly around the area indicated:

0x00077c18: ap_update_child_status_from_indexes+0x0094:	ld      [%o0 + 0x150], %o1
0x00077c1c: ap_update_child_status_from_indexes+0x0098:	tst     %o1
0x00077c20: ap_update_child_status_from_indexes+0x009c:	be,a    ap_update_child_status_from_indexes+0x1c0
(dbx) dis
0x00077c24: ap_update_child_status_from_indexes+0x00a0:	mov     %l4, %i0
0x00077c28: ap_update_child_status_from_indexes+0x00a4:	call    0x000b3ff8 [PLT 63: apr_time_now]
0x00077c2c: ap_update_child_status_from_indexes+0x00a8:	nop     
0x00077c30: ap_update_child_status_from_indexes+0x00ac:	cmp     %i2, 0x2
0x00077c34: ap_update_child_status_from_indexes+0x00b0:	be      ap_update_child_status_from_indexes+0xc4
0x00077c38: ap_update_child_status_from_indexes+0x00b4:	std     %o0, [%l0 + 0x48]
0x00077c3c: ap_update_child_status_from_indexes+0x00b8:	tst     %i2
0x00077c40: ap_update_child_status_from_indexes+0x00bc:	bne     ap_update_child_status_from_indexes+0xe4
0x00077c44: ap_update_child_status_from_indexes+0x00c0:	tst     %i3

Looking at the C code in scoreboard.c, this appears to map to the code around the area of
lines 397-422 (revision 1.55).

Looking at global variables as they relate to this function:

(dbx) display ap_extended_status
ap_extended_status = 1

And now, to look at the registers:

(dbx) regs
current thread: t@1
current frame:  [1]
g0-g3	 0x00000000 0x000c0000 0x0000fd9e 0x00000006
g4-g7	 0x0000005c 0x00000000 0x00000000 0x000bae00
o0-o3	 0x0003999f 0xceb6bfde 0x00000000 0x000b7400
o4-o7	 0x00000000 0x00000000 0xffbef7a8 0x00077c28
l0-l3	 0xfefc100c 0xfefc000c 0x00000000 0xfefc100c
l4-l7	 0x00000000 0x00000000 0x00000000 0xff34cd14
i0-i3	 0x00000000 0x00000000 0x00000001 0x00000000
i4-i7	 0x00000010 0xffbef8d0 0xffbef820 0x00069914
y	 0x00000000
psr	 0xfe901002
pc	 0x00077c38:ap_update_child_status_from_indexes+0xb4	std     %o0, [%l0 + 0x48]
npc	 0x00077c3c:ap_update_child_status_from_indexes+0xb8	tst     %i2

As you can see, the offending instruction is std %o0, [%l0 + 0x48] ... And if you look at
the value for %o0, you'll see that it's set to 0x0003999f, obviously not aligned properly.
Just guessing, perhaps the value is one less than what it should be? (Should be 0x40000?)

As a sidenote, it's interesting that gcc is still using the std opcode, which has been deprecated
in favor of stx.

Here's the contents of config.nice. This faithfully reproduces the problem for me. I haven't
tried it, but I imagine that disabling extended server status would be a viable workaround
for this problem. I would be happy to provide httpd.conf, core, and/or executable if you desire.

Contents of config.nice:

#! /bin/sh
# Created by configure

"./configure" \
"--prefix=/mp3/httpd" \
"--enable-auth-anon" \
"--enable-file-cache" \
"--enable-echo" \
"--enable-cache" \
"--enable-mem-cache" \
"--enable-disk-cache" \
"--enable-ext-filter" \
"--enable-case-filter" \
"--enable-case-filter-in" \
"--enable-deflate" \
"--enable-mime-magic" \
"--enable-cern-meta" \
"--enable-expires" \
"--enable-headers" \
"--enable-usertrack" \
"--enable-unique-id" \
"--enable-http" \
"--enable-dav" \
"--enable-info" \
"--enable-cgi" \
"--enable-cgid" \
"--enable-dav-fs" \
"--enable-speling" \
"--enable-rewrite" \
"--enable-so" \
"--with-z=/usr/lib" \
"--with-mpm=prefork" \

Note that I tried this with both --with-mpm=prefork and --with-mpm=worker. Same results in
both cases.
I would investigate struct scoreboard and any functions that modify ap_scoreboard_image to
determine where there might be a problem.
 [In order for any reply to be added to the PR database, you need]
 [to include <apbugs@Apache.Org> in the Cc line and make sure the]
 [subject line starts with the report component and number, with ]
 [or without any 'Re:' prefixes (such as "general/1098:" or      ]
 ["Re: general/1098:").  If the subject doesn't match this       ]
 [pattern, your message will be misfiled and ignored.  The       ]
 ["apbugs" address is not added to the Cc line of messages from  ]
 [the database automatically because of the potential for mail   ]
 [loops.  If you do not include this Cc, your reply may be ig-   ]
 [nored unless you are responding to an explicit request from a  ]
 [developer.  Reply only with text; DO NOT SEND ATTACHMENTS!     ]

View raw message