httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Stein <gst...@lyra.org>
Subject Re: cvs commit: apache-2.0/src/main http_protocol.c
Date Thu, 09 Nov 2000 00:45:09 GMT
On Wed, Nov 08, 2000 at 06:34:13AM -0800, rbb@covalent.net wrote:
> My CVS change:
>...
> >   -        curr_length = range_end - range_start + 1;
> >   -        loc = range = apr_pcalloc(r->pool, curr_length + 1);
> >   +        /* ### this is so bogus, but not dealing with it right now */
> >   +        range = loc = apr_pcalloc(r->pool, range_length + 1);
> 
> Why is this bogus?  We have no choice but to copy the buffer.

If we must copy the buffer, then we've done something wrong :-)

This certainly isn't your fault/design, but a limitation that we still have
in the brigade/bucket system. But! I have an idea... :-)

[ separate email for that ]

> >   +        /* ### we should split() buckets rather than read() them. this
> >   +           ### will allow us to avoid reading files or custom buckets
> >   +           ### into memory. for example: we REALLY don't want to cycle
> >   +           ### a 10gig file into memory just to send out 100 bytes from
> >   +           ### the end of the file.
> >   +           ###
> >   +           ### content generators that used to call ap_each_byterange()
> >   +           ### manually (thus, optimizing the output) can replace their
> >   +           ### behavior with a new bucket type that understands split().
> >   +           ### they can then defer reading actual content until a read()
> >   +           ### occurs, using the split() as an internal "seek".
> >   +        */
> >   +
> >            ap_bucket_read(e, &str, &n, AP_NONBLOCK_READ);
> >   +        /* ### using e->length doesn't account for pipes */
> 
> Yes it does, because the read that is just above it converted the pipes
> into blocks of actual data.  The thing is that our file reading is bogus,
> but I can't figure out how to fix it, so it has to stay as it is for now.

What should happen is that we split() the bucket. That will prevent a FILE
bucket from reading contents into memory. A PIPE or SOCKET bucket will need
to read up to the split point (into memory), but there may not be much we
can do about that. [ certainly, a first pass will do this; we could
potentially get a bit smarter and allow splitting pipes in some way ]

Thus, using a split() as we skip up to range_start, we can prevent reading
stuff into memory. This is important for FILE buckets, and possibly for
third part buckets (I have one planned for Subversion's content, and mod_dav
will make it easier for custom back-ends to use a bucket to represent a
resource).

> >            while (range_start > (curr_offset + e->length)) {
> >                curr_offset += e->length;
> >                e = AP_BUCKET_NEXT(e);
> >   +
> >                if (e == AP_BRIGADE_SENTINEL(bb)) {
> >                    break;
> >                }
> >   +
> >   +            /* ### eventually we can avoid this */
> >   +            ap_bucket_read(e, &str, &n, AP_NONBLOCK_READ);
> 
> How?  We have just found the start of the data, so we need to have it in
> memory so that we can copy it.

Nope. This is still part of the code that skipping up to range_start.

You are correct, though, that we will need to read() at some point to copy
the bytes into the output buffer. But that is a different loop :-)

> >   -            apr_cpystrn(loc, str + (range_start - curr_offset), 
> >   -                        MIN_LENGTH(curr_length + 1, e->length));
> >   -            loc += MIN_LENGTH(curr_length + 1, e->length);
> >   -            curr_length -= MIN_LENGTH(curr_length + 1, e->length);
> >   +
> >   +            /* ### we should use 'n', not e->length */
> >   +            segment_length = MIN_LENGTH(curr_length + 1, e->length);
> 
> It doesn't matter if we use n or e->length, because the reads that you
> want to get rid of make them equivalent.

At this point in the code, you are correct. I just thought it would be
semantically cleaner to use what read() returns for the length. But I didn't
want to go and change that just yet :-)

> Can I assume that the only difference in the output is that we now add the
> content length correctly?

In terms of output: yes. But I also made a number of logical changes and
improved the memory footprint quite a bit. There were items getting
allocated on every entry to the filter (and then not used). I tossed a lot
of those and also simplified some other logic. etc.

> If so, I had an idea last night that I think is
> slightly cleaner than this.  If we put back the original byterange filter,
> and add it to the filter stack before the content-length filter for all
> requests, it can check very quickly to determine if it should be
> active.  If not, it just removes itself.  If so, it does the byterange
> magic.  Then, we can rely on the content-length filter to compute the C-L
> correctly everytime.

Using the C-L filter to compute the length is a definitely possibility.
However, the C-L works now, and I have some other changes to make to the
byterange filter to optimize it drastically. Please keep this idea in mind,
but it is working, so I'd like to defer this idea for a moment.

> I figured this out late last night, and I kind of like it because it
> re-uses code, and allows us to get aways with just having a single place
> to calculate the C-L, which is a VERY good thing IMHO.

Yes, it does make good sense, and I'm inclined to agree it would be good. It
would simplify ap_set_byterange(). But I want to take a closer look at the
C-L filter to see how it does it work first. I'm hoping it avoids a read().
But in any case, let's hold the thought for a bit (and we can since the
byterange output appears to be working)

> If there were more differences, (other than general cleanup), please let
> me know.  I would love to know what the originl problem was.

Look at the loop in ap_set_byterange(). It needs to add in the length of the
boundaries and part-headers to the overall content-length. It wasn't doing
that at all. (if you look at the "realreq" logic in the original code, you
can see where it was doing that)

Have you happened to try the new code with the PDF plugin? I don't have the
testing setup for PDF here. I was eyeballing the output, and using a
fetching script to ensure that my Content-Length was computed properly.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Mime
View raw message