httpd-bugs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 35974] - Occasional seg fault/bus error in NFS hosted includes-parsed files
Date Wed, 03 Aug 2005 12:23:57 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=35974>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=35974





------- Additional Comments From stuart@terminus.co.uk  2005-08-03 14:23 -------
> It would be useful to monitor memory use over time to be able to correlate the
> child crashes with memory exhaustion.  (using sar or just vmstat, for example)

We do have monitoring systems which should pick that kind of event up, but I now
have sar running as well.

> - try an "--enable-pool-debug" build, if possible; such a build will be much
> slower and consume much more memory overall but might help tracking the issue
> down

This is being done. Will have to check the memory usage, but slower shouldn't
matter as we're only testing on one server out of a group.

> - check the r->uri in case this an issue specific to a particular resource, or
> at least, to localize testing against some specific resources.

Yeah, we tried this before. We managed to recreate a couple of requests from the
non-debugging enabled cores (using adb and some knowledge of the structures
involved), and replay those without problem. Having inspected the more recent
cores (we had a few more overnight) there's still no pattern. We suspected
cookies at one point (as that's often where testing doesn't mirror real
traffic), but again they look normal. A couple seem to have happened outside
request handling (eg: in apr_pool_clear called from the loop in child_main).

As I mentioned, we've got some more backtraces now. There hasn't been a repeat
of the NULL pool error - they're more similar to the first backtrace I attached.
One common thing I've noticed is that whilst the errors are happening in a few
places (though mostly within apr_pools.c or apr_buckets_alloc.c), they mostly
seem to involve the value of a pointer to a apr_memnode_t being a bad address.
For example:

(gdb) bt
#0  0xff1dedb4 in allocator_alloc (allocator=0x3c0ef0, size=8192) at apr_pools.c:219
#1  0xff1dd6d8 in apr_pool_create_ex (newpool=0x3c4044, parent=0x3c3a70,
abort_fn=0, allocator=0x3c0ef0) at apr_pools.c:804
#2  0x000d1f18 in core_output_filter (f=0x3c3f10, b=0x3dc8f0) at core.c:4182
#3  0x000c14bc in ap_pass_brigade (next=0x3c3f10, bb=0x3dc8f0) at util_filter.c:512
#4  0x00081308 in ap_http_header_filter (f=0x3d2838, b=0x3dc868) at
http_protocol.c:1668
#5  0x000c14bc in ap_pass_brigade (next=0x3d2838, bb=0x3dc868) at util_filter.c:512
...

Line 219 of apr_pools.c being:
    if ((*ref = node->next) == NULL && i >= max_index) {
So:

(gdb) p node
$3 = (apr_memnode_t *) 0x4f6a5536
(gdb) p *node
Cannot access memory at address 0x4f6a5536

If you'd like the full backtrace of that, or indeed the other new backtraces,
just let me know.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


Mime
View raw message