apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cliff Woolley <jwool...@virginia.edu>
Subject Re: [PATCH] Re: the most common seg fault on daedalus
Date Thu, 11 Apr 2002 22:49:42 GMT
On Wed, 10 Apr 2002, Greg Ames wrote:

> We have another similar looking dump, with this patch on.  It's
> /usr/local/apache2.0.35c/corefiles/httpd.core.2
> #0  mmap_cleanup (themmap=0x8153ac0) at mmap.c:98
> #1  0x280cc8f0 in apr_mmap_delete (mm=0x8153ac0) at mmap.c:202
> #2  0x280ad926 in mmap_destroy (data=0x8131088) at apr_buckets_mmap.c:82
> #3  0x280adf08 in apr_brigade_cleanup (data=0x814c4f8) at apr_brigade.c:86
> #4  0x280adebe in brigade_cleanup (data=0x814c4f8) at apr_brigade.c:72
> #5  0x280cdd3b in run_cleanups (c=0x81393f8) at apr_pools.c:1713
> #6  0x280cd51c in apr_pool_destroy (pool=0x813f010) at apr_pools.c:638
> #7  0x805ea1c in ap_process_http_connection (c=0x812c120) at http_core.c:300

Okay, so this is a totally different problem.  Tangentially related at

Here's the situation:

 1) create a brigade in pool p (call it r->pool)
 2) create an mmap in the same pool or a subpool
      (note: it might have to be a subpool, but I'm not sure)
 3) put the mmap in an mmap bucket
 4) clear pool p

Because the mmap was created *after* the brigade and pool cleanups get run
in LIFO order, #4 implies the following:

  4a) mmap_cleanup in mmap.c will munmap the region and set mm->mm to -1
      (note: the -1 is what was *supposed* to save us here)
  4b) brigade_cleanup will destroy the mmap bucket via mmap_destroy from
  4c) mmap_destroy discovers it's the last remaining reference to the
      mmap, and so calls apr_mmap_delete
  4d) boom

The question is, on 4c, why does it fail?  Because even though we call
apr_mmap_delete, the mm->mm==-1 should cause apr_mmap_delete to fail with
APR_ENOENT rather than crashing.  But the kicker: mm itself is no longer
in accessible memory:

Program terminated with signal 11, Segmentation fault.
(gdb) frame 0
#0  mmap_cleanup (themmap=0x8153ac0) at mmap.c:98
98          mm->mm = (void *)-1;
(gdb) p mm
$1 = (apr_mmap_t *) 0x8153ac0
(gdb) p *mm
Cannot access memory at address 0x8153ac0.


So what can we do?

 Option 1: go back to having mmap_destroy in apr_buckets_mmap.c *not* call
           apr_mmap_delete().  But the reason we put it in there in the
           first place was to accommodate large file buckets that get
           apr_bucket_read(); we mmap the thing in like 4MB increments,
           but we only want 4MB mapped at a time, not the whole file.

    Note that having mmap_destroy call apr_mmap_delete works
    fine as long as you delete the mmap bucket before cleaning up
    the pool, or at least call apr_brigade_cleanup when you're done
    with the brigade rather than leaving it to the pool cleanup.
    But that defeats the purpose.  Ideally mmap_destroy would be
    able to figure out whether the apr_mmap_t had already been cleaned
    up, and if not, skip the apr_mmap_delete.  Which brings me to:

 Option 2: apr_bucket_mmap_make() could register a cleanup on the
           apr_mmap_t's pool that, when run, would do the Right Thing
           to account for the mmap having been cleaned up.  But what
           is the Right Thing?

    Option 2a: The Right Thing might be to just set a flag in the
               mmap bucket that says "this mmap is no longer valid,
               never try to access it again."  (But that's inconsistent
               with what pool buckets do, so...)

    Option 2b: The Right Thing might be to copy the data in the mmap
               to the heap and morph ourselves to a heap bucket just
               as the pool buckets do.  (Ouch. *)

    Option 2c: The Right Thing might be to morph ourselves to an
               immortal bucket containing the empty string [or an
               error metadata bucket of some type? that might be

    Note that an option that might come to mind that would NOT work is
    to just have the cleanup delete the mmap bucket from whatever brigade
    it's in.  That won't work simply because it's not safe -- the bucket
    has no way to know whether any functions might still be holding
    a pointer to it.

* I say ouch because that also means anytime we cleanup a brigade's pool
and the brigade still has pool buckets in it, if the data in the pool
bucket was allocated from the same pool as the brigade, we morph the pool
bucket to a heap bucket (copy the data) and immediately turn around and
destroy it again.  I can't think of an easy solution for this one.
Granted, it's not killer like the problem above, just suboptimal.


   Cliff Woolley
   Charlottesville, VA

View raw message