Phew, finally got a core dump and this stack trace:
#0 0x52e50 in ap_proxy_send_fb ()
#1 0x514c4 in ap_proxy_http_handler ()
#2 0x466d0 in mod_perl_set_opmask ()
#3 0x899ac in ap_invoke_handler ()
#4 0xa4010 in ap_some_auth_required ()
#5 0xa4090 in ap_process_request ()
#6 0x98f8c in ap_child_terminate ()
#7 0x992e8 in ap_child_terminate ()
#8 0x99828 in ap_child_terminate ()
#9 0x99fb4 in ap_child_terminate ()
#10 0x9a818 in main ()
mod_perl certainly appears to be involved...
By the way, to potentially save someone else the pain of figuring out
how to get a core file from a setuid Apache httpd process in Solaris,
without knowing which httpd child might be the one to die, here's what
I ended up doing:
# for pid in `ps -eaf | fgrep httpd_1.3.3 | cut -d' ' -f4`
> do
> truss -f -l -t\!all -S SIGSEGV -p $pid 2>&1 | egrep SIGSEGV &
> done
The undocumented '-S' flag to truss will halt the process in place
upon receipt of a given signal (SIGSEGV in this case.) At this point
I used:
# gcore <PID>
to generate a core file, then
# gdb httpd_1.3.3 core.<PID>
(gdb) where
to get the backtrace.
The following Usenet thread was a big help:
http://www.dejanews.com/=zzz_maf/dnquery.xp?search=thread&svcclass=dnserver&recnum=%3c6iueei$fjt@engnews1.Eng.Sun.COM%3e%231/1
Michael Smith writes:
> Ralf S. Engelschall wrote:
>
> > In article <361B7975.F2B954AC@iii.co.uk> you wrote:
> >
> > > Probably not a show-stopper but I'm seeing segmentation faults under
> > > 1.3.3 (also get them under 1.3.2); which look like this in the error
> > > log:
> >
> > > [Wed Oct 7 16:20:52 1998] [notice] httpd: child pid 21631 exit signal
> > > Segmentation Fault (11)
> > > [Wed Oct 7 16:21:43 1998] [notice] httpd: child pid 21623 exit signal
> > > Segmentation Fault (11)
> > >[...]
> > > Anyone else seen something similar?
> >
> > ARGL! Good that you say something. I'm also currently debugging (mod_ssl's
> > gcache stuff) 1.3.3 and get a similar "child exit" problem. I thought it was a
> > gcache bug recently introduced, but after inspecting the recent code changes I
> > now thing it's more a bug in Apache. I'm still debugging, so cannot say more
> > than my problem is a similar one with 1.3.3.... more details coming when I
> > know more.
>
> Yeh, we're trying to get some more information but haven't been able to persuade
> the solaris kernel to let 'nobody' dump core. We're not using mod_ssl so it's
> not an issue with that module for us! Unfortunately we only see it on our "live"
> system, where it appears every minute or so (= about every 500 hits).
>
> More details coming when we know more!
>
--
Doug Bloebaum interactive investor
Systems Engineer 105-109 Strand, London WC2R 0AB
blabes@iii.co.uk http://www.iii.co.uk
|