httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cliff Woolley <>
Subject Re: PHP profiling results under 2.0.37 Re: Performance of Apache 2.0 Filter
Date Fri, 07 Jun 2002 20:33:09 GMT

On Fri, 7 Jun 2002, Brian Pane wrote:

> IMHO, that's a design flaw.  Regardless of whether PHP is doing
> buffering, it shouldn't break up blocks of static content into
> small pieces--especially not as small as 400 bytes.  While it's
> certainly valid for PHP to insert a flush bucket right before a
> block of embedded code (in case that code takes a long time to
> run), breaking static text into 400-byte chunks will usually mean
> that it takes *longer* for the content to reach the client, which
> probably defeats PHP's motivation for doing the nonbuffered output.
> There's code downstream, in the httpd's core_output_filter and
> the OS's TCP driver, that can make much better decisions about
> when to buffer and when not to buffer.

FWIW, I totally agree here.  One of the biggest problems with the way PHP
handles buckets (as I'm sure has been discussed before I know) is that
static content cannot remain in its native form as it goes through PHP, or
at least not in very big chunks.  Take as a counterexample the way
mod_include deals with FILE buckets.  It reads the FILE bucket (which
causes the file the be MMAPed if allowed), and from there it just scans
through the mmaped region, and if it finds nothing, it hands it on to the
next filter still in the single-MMAP-bucket form.  PHP/Zend, on the other
hand, takes the file descriptor out of the file bucket, runs it through a
lexical analyzer which tokenizes it up to 400 bytes at a time, runs it
through the yacc-generated grammar as necessary, and handles it from
there.  Far more optimal would be to take the input, do a search through
it for a starting tag just as mod_include does, and if none is found (or
up until one is found), just tell the SAPI module to "go ahead and send up
to THIS point out to the client unmodified".

So basically the difference between this and what we have now is that the
lexer should not take each 400 byte buffer and say "here is (up to) 400
bytes of inline HTML, send it to the client as-is"; instead, it should be
able to do something along the lines of taking the input 400 bytes at a
time, and as soon as it starts seeing inline HTML, keep track of the
starting offset (in bytes), and keep scanning through those 400 byte
buffers in a tight loop until it finds something that's NOT inline HTML,
and set the ending offset.  Then it can notify PHP in one call "send bytes
375-10136 to the client as-is, it's inline html".

Another important thing that the lexical analyzer needs to support is that
the user of Zend (PHP in this case) should be able to specify the YY_INPUT
function rather than being *forced* to give Zend a filename or file
descriptor.  That's absolutely critical for the filtering design under
Apache 2.0 to work right.  What we have now is a total kludge.

I realize I'm calling into question some fundamental design decisions of
which I was not a part, so of course there is the possibility that I'm
missing some important detail.  But I think it would be relatively easy to
insert an optimization here that could make a huge difference without
breaking too many assumptions in the code.  I think.


View raw message