Mailing-List: contact new-httpd-help@apache.org; run by ezmlm
Precedence: bulk
Reply-To: new-httpd@apache.org
Date: Fri, 1 Jun 2001 11:47:59 -0700 (PDT)
From: <rbb@covalent.net>
To: <new-httpd@apache.org>
Subject: Re: file/mmap buckets, subrequests, pools, 2.0.18
In-Reply-To: <m3pucogirg.fsf@adsl-77-241-65.rdu.bellsouth.net>
Message-ID: <Pine.LNX.4.30.0106011145560.4967-100000@koj.rkbloom.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

On 1 Jun 2001, Jeff Trawick wrote:

> <rbb@covalent.net> writes:
>
> > On Fri, 1 Jun 2001, Jeff Trawick wrote:
> >
> > > Greg Ames put together a 2.0.18 build for the purposes of running on
> > > apache.org but we were never able to switch over to it from 2.0.16
> > > because of a hard failure with certain types of requests.
> > >
> > > The problem, trying to simplify a bit:
> > >
> > > In a subrequest, default_handler opens a file using the pool
> > > associated with the subrequest and creates a file bucket.
> > >
> > > After the subrequest is over, we still have the file bucket with the
> > > apr_file_t that had the now-destroyed pool.  Any references to the
> > > apr_file_t are broken because the pool was destroyed.
> > >
> > > We finally died back in the main request processing because we
> > > referenced an mmap bucket whose apr_mmap_t used the same pool as the
> > > apr_file_t.  The apr_mmap_t had been overlaid because the storage was
> > > released at the end of the main request and subsequently reused for
> > > some other purpose.
> > >
> > > We have never hit this problem with 2.0.16 and we always hit this
> > > problem with 2.0.18, leading me to believe that the problem was
> > > affected by recent code changes, even though it seems to be a basic
> > > design issue.
> > >
> > > Any ideas about what recent changes might have introduced or exposed
> > > this problem?
> >
> > This is realitively simple.  A while ago, I changed the default handler to
> > use the connection pool to fix this problem.  A couple of months ago, Dean
> > pointed out that this was a major resource leak.  After 2.0.16, somebody
> > (Roy?) pointed out that this was a pretty big problem when serving a lot
> > of very large files on the same connection.
> >
> > The solution was a simple loop at the end of the core_output_filter that
> > reads in the data from the file to memory.  This is okay to do, because we
> > are garuanteed to have less than 9k of data.  It sounds like the problem
> > is that we don't read in the data if we have an MMAP, or we may not be
> > getting into the loop on sub-requests.
>
> By "simple loop" do you mean the call to ap_save_brigade()?
>
> I haven't verified that we even get to that point on a subrequest, but
> my current guess is that we get to ap_save_brigade() but the setaside
> doesn't do anything for the mmap bucket because mmaps aren't normally
> transient.

Okay, I had a chance to look at the code, instead of going from memory.
The code that is causing the problem is:

        if ((!fd && !more &&
            (nbytes + flen < AP_MIN_BYTES_TO_WRITE) &&
!APR_BUCKET_IS_FLUSH(e))
            || (nbytes + flen < AP_MIN_BYTES_TO_WRITE &&
APR_BUCKET_IS_EOS(e) && c->keepalive)) {
            /* NEVER save an EOS in here.  If we are saving a brigade with
             * an EOS bucket, then we are doing keepalive connections, and
             * we want to process to second request fully.
             */
            if (APR_BUCKET_IS_EOS(e)) {
                apr_bucket *bucket = NULL;
                /* If we are in here, then this request is a keepalive.
We
                 * need to be certain that any data in a bucket is valid
                 * after the request_pool is cleared.
                 */
                if (ctx->b == NULL) {
                    ctx->b = apr_brigade_create(f->c->pool);
                }

                APR_BRIGADE_FOREACH(bucket, b) {
                    const char *str;
                    apr_size_t n;

                    rv = apr_bucket_read(bucket, &str, &n,
APR_BLOCK_READ);

                    /* This apr_brigade_write does not use a flush
function
                       because we assume that we will not write enough
data
                       into it to cause a flush. However, if we *do* write
                       "too much", then we could end up with transient
                       buckets which would suck. This works for now, but
is
                       a bit shaky if changes are made to some of the
                       buffering sizes. Let's do an assert to prevent
                       potential future problems... */
                    AP_DEBUG_ASSERT(AP_MIN_BYTES_TO_WRITE <=
                                    APR_BUCKET_BUFF_SIZE);
                    apr_brigade_write(ctx->b, NULL, NULL, str, n);
                }
                apr_brigade_destroy(b);
            }
            else {
                ap_save_brigade(f, &ctx->b, &b);
            }
            return APR_SUCCESS;
        }

This is meant to solve the problem, but we are only going into the loop
that solves the problem if we see an EOS bucket.  Of course, the
sub-request filter is removing the EOS bucket.  So, the solution IMO, is
to do this same loop in the subreq_output_filter.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------