httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Stein <>
Subject [PATCH] ap_r* performance patch
Date Fri, 19 Jan 2001 03:17:01 GMT
I've attached the patch and an strace of my earlier-described design. Greg
Ames ran an "ab" test to measure the performance improvement for the other
patch and saw a 2.6x increase in speed. The patch below gives 3.5x :-)

I'll describe the design in detail:

*) there is a new filter named OLD_WRITE (to support the old writing style;
   the name isn't important, though :-)

*) OLD_WRITE is installed as an AP_FTYPE_CONTENT-1 filter so that it will
   stay at the head of the output chain whenever possible.

*) the filter keeps a buffer of AP_MIN_BYTES_TO_WRITE (9K) in its context,
   allocated using malloc(). Later on, this will be inserted into a HEAP
   bucket and sent down the chain. Since it is on the HEAP, it will be
   returned to storage after delivery down the chain.

*) whenever *any* brigade arrives in the filter chain, this filter will
   prepend its buffer contents (if any) to the brigade and pass all the
   content on down the chain. This provides correct ordering in a
   mixed-output environment. (the buffer contents arrived before the new

*) the flush_buffer() routine is used deliver the filter's buffer to the
   output chain. an optional "extra" data may be delivered, too. This latter
   feature is to support large writes -- they go straight into the chain
   rather than getting copied into a buffer.

*) buffer_output() is the main function to buffer the output for delivery.

   a) it searches for whether OLD_WRITE has been inserted, and inserts it if
      it hasn't been inserted already.

   b) if OLD_WRITE isn't the first filter, then the data is delivered
      normally into the output chain.
      - if we did a regular buffer-store, then the content would skip
        filters in between the top and the OLD_WRITE filter
      - this implies that we return to current bit-by-bit behavior if
        something gets in the way. (nothing should given our filter type)

   c) the buffer is allocated, if necessary, and the data is placed into the
      buffer if room is available. if the buffer becomes full, then it is
      flushed to the output.

   d) if the new data is "big", then we flush the existing contents plus the
      big data (as "extra").

   e) otherwise, the data is "small" but doesn't fit in the buffer. we max
      out the buffer, flush it down the chain, and put the remaining into a
      new buffer.

*) the ap_r* functions simply call buffer_output() to buffer/deliver

*) a couple further improvements have been noted within the patch.

Of course, please pipe up if you have questions.

Benefits of this patch:

1) 3.5x speedup in requests/sec and thruput on my box
2) strace shows a single writev and hugely improved alloc behavior
3) mixed-style output is possible
4) network congestion is used
5) memory is NOT proportional to output size: it mallocs/frees at each
   network delivery
6) ap_r* API is unchanged

Yesterday, Ryan posted a list of five "priorities", which are all met by the
above list of benefits.


Greg Stein,

View raw message