httpd-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From field...@locus.apache.org
Subject cvs commit: apache-2.0/src/lib/apr/buckets doc_wishes.txt
Date Thu, 13 Jul 2000 07:25:47 GMT
fielding    00/07/13 00:25:44

  Modified:    src/lib/apr/buckets doc_wishes.txt
  Log:
  More use cases and functional desires.
  
  Revision  Changes    Path
  1.2       +258 -34   apache-2.0/src/lib/apr/buckets/doc_wishes.txt
  
  Index: doc_wishes.txt
  ===================================================================
  RCS file: /home/cvs/apache-2.0/src/lib/apr/buckets/doc_wishes.txt,v
  retrieving revision 1.1
  retrieving revision 1.2
  diff -u -r1.1 -r1.2
  --- doc_wishes.txt	2000/07/13 05:03:41	1.1
  +++ doc_wishes.txt	2000/07/13 07:25:44	1.2
  @@ -6,40 +6,264 @@
   Dirk's original list:
   ---------------------
   
  -This file is there so that I do not have to remind myself
  -about the reasons for Layered IO, apart from the obvious one.
  +  This file is there so that I do not have to remind myself
  +  about the reasons for Layered IO, apart from the obvious one.
   
  -0. To get away from a 1 to 1 mapping
  +  0. To get away from a 1 to 1 mapping
   
  -   i.e. a single URI can cause multiple backend requests, 
  -   in arbitrary configurations, such as in paralel, tunnel/piped, 
  -   or in some sort of funnel mode. Such multiple backend
  -   requests, with fully layered IO can be treated exactly
  -   like any URI request; and recursion is born :-)
  -
  -1. To do on the fly charset conversion
  -
  -   Be, theoretically, be able to send out your content using
  -   latin1, latin2 or any other charset; generated from static
  -   _and_ dynamic content in other charsets (typically unicode
  -   encoded as UTF7 or UTF8). Such conversion is prompted by
  -   things like the user-agent string, a cookie, or other hints
  -   about the capabilities of the OS, language preferences and
  -   other (in)capabilities of the final receipient. 
  -
  -2. To be able to do fancy templates
  -
  -   Have your application/cgi sending out an XML structure of
  -   field/value pair-ed contents; which is substituted into a 
  -   template by the web server; possibly based on information 
  -   accessible/known to the webserver which you do not want to 
  -   be known to the backend script. Ideally that template would
  -   be just as easy to generate by a backend as well (see 0).
  -
  -3. On the fly translation
  -
  -   And other general text and output mungling, such as translating
  -   an english page in spanish whilst it goes through your Proxy,
  -   or JPEG-ing a GIF generated by mod_perl+gd.
  +     i.e. a single URI can cause multiple backend requests, 
  +     in arbitrary configurations, such as in paralel, tunnel/piped, 
  +     or in some sort of funnel mode. Such multiple backend
  +     requests, with fully layered IO can be treated exactly
  +     like any URI request; and recursion is born :-)
   
  -Dw.
  +  1. To do on the fly charset conversion
  +
  +     Be, theoretically, be able to send out your content using
  +     latin1, latin2 or any other charset; generated from static
  +     _and_ dynamic content in other charsets (typically unicode
  +     encoded as UTF7 or UTF8). Such conversion is prompted by
  +     things like the user-agent string, a cookie, or other hints
  +     about the capabilities of the OS, language preferences and
  +     other (in)capabilities of the final receipient. 
  +
  +  2. To be able to do fancy templates
  +
  +     Have your application/cgi sending out an XML structure of
  +     field/value pair-ed contents; which is substituted into a 
  +     template by the web server; possibly based on information 
  +     accessible/known to the webserver which you do not want to 
  +     be known to the backend script. Ideally that template would
  +     be just as easy to generate by a backend as well (see 0).
  +
  +  3. On the fly translation
  +
  +     And other general text and output mungling, such as translating
  +     an english page in spanish whilst it goes through your Proxy,
  +     or JPEG-ing a GIF generated by mod_perl+gd.
  +
  +  Dw.
  +
  +
  +Dean's canonical list of use cases
  +----------------------------------
  +
  +Date: Mon, 27 Mar 2000 17:37:25 -0800 (PST)
  +From: Dean Gaudet <dgaudet-list-new-httpd@arctic.org>
  +To: new-httpd@apache.org
  +Subject: canonical list of i/o layering use cases
  +Message-ID: <Pine.LNX.4.21.0003271648270.14812-100000@twinlark.arctic.org>
  +
  +i really hope this helps this discussion move forward.
  +
  +the following is the list of all applications i know of which have been
  +proposed to benefit from i/o layering.
  +
  +- data sink abstractions:
  +	- memory destination (for ipc; for caching; or even for abstracting
  +		things such as strings, which can be treated as an i/o
  +		object)
  +	- pipe/socket destination
  +	- portability variations on the above
  +
  +- data source abstraction, such as:
  +	- file source (includes proxy caching)
  +	- memory source (includes most dynamic content generation)
  +	- network source (TCP-to-TCP proxying)
  +	- database source (which is probably, under the covers, something like
  +		a memory source mapped from the db process on the same box,
  +		or from a network source on another box)
  +	- portability variations in the above sources
  +
  +- filters:
  +	- encryption
  +	- translation (ebcdic, unicode)
  +	- compression
  +	- chunking
  +	- MUX
  +	- mod_include et al
  +
  +and here are some of my thoughts on trying to further quantify filters:
  +
  +a filter separates two layers and is both a sink and a source.  a
  +filter takes an input stream of bytes OOOO... and generates an
  +output stream of bytes which can be broken into blocks such
  +as:
  +
  +	OOO NNN O NNNNN ...
  +
  +	where O = an old or original byte copied from the input
  +	and N = a new byte generated by the filter
  +
  +for each filter we can calculate a quantity i'll call the copied-content
  +ratio, or CCR:
  +
  +	nbytes_old / nbytes_new
  +
  +where:
  +	nbytes_old = number of bytes in the output of the
  +		filter which are copied from the input
  +		(in zero-copy this would mean "copy by
  +		reference counting an input buffer")
  +	nbytes_new = number of bytes which are generated
  +		by the filter which weren't present in the
  +		input
  +
  +examples:
  +
  +CCR = infinity:  who cares -- straight through with no
  +	transformation.  the filter shouldn't even be there.
  +
  +CCR = 0: encryption, translation (ebcdic, unicode), compression.
  +	these get zero benefit from zero-copy.
  +
  +CCR > 0: chunking, MUX, mod_include
  +
  +from the point of view of evaluating the benefit of zero-copy we only
  +care about filters with CCR > 0 -- because CCR = 0 cases degenerate into
  +a single-copy scheme anyhow.
  +
  +it is worth noting that the large_write heuristic in BUFF fairly
  +clearly handles zero-copy at very little overhead for CCRs larger than
  +DEFAULT_BUFSIZE.
  +
  +what needs further quantification is what the CCR of mod_include would
  +be.
  +
  +for a particular zero-copy implementation we can find some threshold k
  +where filters with CCRs >= k are faster with the zero-copy implementation
  +and CCRs < k are slower... faster/slower as compared to a baseline
  +implementation such as the existing BUFF.
  +
  +it's my opinion that when you consider the data sources listed above, and
  +the filters listed above that *in general* the existing BUFF heuristics
  +are faster than a complete zero-copy implementation.
  +
  +you might ask how does this jive with published research such as the
  +IO-Lite stuff?  well, when it comes right down to it, the research in
  +the IO-Lite papers deal with very large CCRs and contrast them against
  +a naive buffering implementation such as stdio -- they don't consider
  +what a few heuristics such as apache's BUFF can do.
  +
  +Dean
  +
  +
  +Jim's summary of a discussion
  +-----------------------------
  +
  +  OK, so the main points we wish to address are (in no particular order):
  +
  +     1. zero-copy
  +     2. prevent modules/filters from having to glob the entire
  +        data stream in order to start processing/filtering
  +     3. the ability to layer and "multiplex" data and meta-data
  +        in the stream
  +     4. the ability to perform all HTTP processing at the
  +        filter level (including proxy), even if not implemented in
  +        this phase
  +     5. Room for optimization and recursion
  +
  +  Jim Jagielski
  +
  +
  +Roy's ramblings
  +---------------
  +
  +  Data flow networks are a very well-defined and understood software
  +  architecture.  They have a single, very important constraint: no filter
  +  is allowed to know anything about the nature of its upstream or downstream
  +  neighbors beyond what is defined by the filter's own interface.
  +  That constraint is what makes data flow networks highly configurable and
  +  reusable.  Those are properties that we want from our filters.
  +
  +  ...
  +
  +  One of the goals of the filter concept was to fix the bird's nest of
  +  interconnected side-effect conditions that allow buff to perform well
  +  without losing the performance.  That's why there is so much trepidation
  +  about anyone messin with 1.3.x buff.
  +
  +  ...
  +
  +  Content filtering is my least important goal.  Completely replacing HTTP
  +  parsing with a filter is my primary goal, followed by a better proxy,
  +  then internal memory caches, and finally zero-copy sendfile (in order of
  +  importance, but in reverse order of likely implementation).  Content
  +  filtering is something we get for free using the bucket brigade interface,
  +  but we don't get anything for free if we start with an interface that only
  +  supports content filtering.
  +
  +  ...
  +
  +  I don't think it is safe to implement filters in Apache without either
  +  a smart allocation system or a strict limiting mechanism that prevents
  +  filters from buffering more than 8KB [or user-definable amount] of memory
  +  at a time (for the entire non-flushed stream).  It isn't possible to
  +  create a robust server implementation using filters that allocate memory
  +  from a pool (or the heap, or a stack, or whatever) without somehow
  +  reclaiming and reusing the memory that gets written out to the network.
  +  There is a certain level of "optimization" that must be present before
  +  any filtering mechanism can be in Apache, and that means meeting the
  +  requirement that the server not keel over and die the first time a user
  +  requests a large filtered file.  XML tree manipulation is an example
  +  where that can happen.
  +
  +  ...
  +
  +  Disabling content-length just because there are filters in the stream
  +  is a blatant cop-out.  If you have to do that then the design is wrong.
  +  At the very least the HTTP filter/buff should be capable of discovering
  +  whether it knows the content length by examing whether it has the whole
  +  response in buffer (or fd) before it sends out the headers.
  +
  +  ...
  +
  +  No layered-IO solution will work with the existing memory allocation
  +  mechanisms of Apache.  The reason is simply that some filters can
  +  incrementally process data and some filters cannot, and they often
  +  won't know the answer until they have processed the data they are given.
  +  This means the buffering mechanism needs some form of overflow mechanism
  +  that diverts parts of the stream into a slower-but-larger buffer (file),
  +  and the only clean way to do that is to have the memory allocator for the
  +  stream also do paging to disk.  You can't do this within the request pool
  +  because each layer may need to allocate more total memory than is available
  +  on the machine, and you can't depend on some parts of the response being
  +  written before later parts are generated because some filtering
  +  decisions require knowledge of the end of the stream before they
  +  can process the beginning.
  +
  +  ...
  +
  +  The purpose of the filtering mechanism is to provide a useful
  +  and easy to understand means for extending the functionality of
  +  independent modules (filters) by rearranging them in stacks
  +  via a uniform interface.
  +
  +
  +Paul J. Reder's use cases for filters
  +-------------------------------------
  +
  +  1) Containing only text.
  +  2) Containing 10 .gif or .jpg references (perhaps filtering
  +     from one format to the other).
  +  3) Containing an exec of a cgi that generates a text only file
  +  4) Containing an exec of a cgi that generates an SSI of a text only file.
  +  5) Containing an exec of a cgi that generates an SSI that execs a cgi
  +     that generates a text only file (that swallows a fly, I don't know why).
  +  6) Containing an SSI that execs a cgi that generates an SSI that
  +     includes a text only file.
  +     NOTE: Solutions must be able to handle *both* 5 and 6. Order
  +           shouldn't matter.
  +  7) Containing text that must be altered via a regular expression
  +     filter to change all occurrences of "rederpj" to "misguided"
  +  8) Containing text that must be altered via a regular expression
  +     filter to change all occurrences of "rederpj" to "lost"
  +  9) Containing perl or php that must be handed off for processing.
  +  10) A page in ascii that needs to be converted to ebcdic, or from
  +      one code page to another.
  +  11) Use the babelfish translation filter to translate text on a
  +      page from Spanish to Martian-Swahili.
  +  12) Translate to Esperanto, compress, and encrypt the output from
  +      a php program generated by a perl script called from a cgi exec
  +      embedded in a file included by an SSI  :)
  +
  
  
  

Mime
View raw message