httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Bloom <>
Subject Re: Input Filtering and Pushback.
Date Mon, 31 Dec 2001 18:02:41 GMT
On Monday 31 December 2001 02:40 am, Justin Erenkrantz wrote:
> I've attempted to think some more about how we should handle 
> "pushback" in the input filtering (say that a caller does a 
> speculative read - i.e. MIME-continuation in ap_[r]getline).

First of all, this isn't pushback at all.  Pushback means that you are
taking data that you have read, and are handing it back to a lower
filter.  That doesn't exist in Apache's filtering model, and it isn't

> I think the best strategy would be to introduce a new input filter
> mode (call it PEEK and rename the current PEEK mode to EATCRLF) that 

I would recommend not re-using the PEEK mode, because that will
cause confusion to people who have written input filters already.

> will indicate to input filters that this data may not be consumed
> and that it should hold on to the data if possible so that it can
> be read again (on a non-PEEK call).  Using this strategy wisely 
> should remove all instances of a req_cfg->bb (such as what 
> ap_[r]getline uses).  Data may only be stored inside the filter
> chain (no external brigades with unread input are allowed!).
> What follows is my high-level algorithm on how ap_[r]getline would 
> work with this strategy.  Please provide any feedback or thoughts
> you may have on this.  Thanks.  -- justin
> ap_getline with buckets and brigades:
> 1. call ap_get_brigade with 0 to indicate want LF-line.
>    - ap_get_brigade may return with a line that does not have
>      a LF-ending (i.e. len == HUGE_STRING_LEN).
> 2. Check to see if we got an error value back (!= APR_SUCCESS)
> 3. Call apr_brigade_length to determine length of brigade.
> 4. Do we have enough space in our caller's buffer to handle 
>    this?  (If we're not doing the allocation for them, that is.)
> 5. Run through all buckets in the brigade and read/copy into
>    the caller's buffer.  (What happened to zero-copy?)

We don't do zero-copy on input.  The real benefit to zero-copy
is on output, because the data tends to be larger.

> 6. Does last bucket's data (APR_BRIGADE_LAST) have LF char?
>    - If no LF, go to step 1.
> 7. We now have a LF-line (otherwise, we'd error out).
> 8. Attempt to compress the EOL by removing any CR or whitespace.
>    - This would maintain backwards compatibility, but may no 
>      longer be worthwhile.
> 9. Are we asked to handling folding?
>    - If not, then translate the buffer from ascii and return.
> 10. Call ap_get_brigade with PEEK.
>    - Should this PEEK call block?  Is that an option that the
>      caller to ap_get_brigade specifies?  I think ap_getline
>      would indeed want it to be a blocking PEEK.

If we are going to conitnue to extend input filtering, we should
probably split the mode from the blocking.

>    - This peek is not the same as current AP_MODE_PEEK (which 
>      just eats CRLFs).  It would require some modifications to 
>      the current input filters to ensure that each filter does
>      *NOT* delete the returned data from its input stack, but
>      instead copies the returned buckets.  It seems that this 
>      would be possible by doing a bucket copy (which should 
>      always succeed since we are dealing with already-read 
>      data).
>         - Let's consider how core_input_filter would handle this:
>           Have it treat it as a *readbytes != 0 case with the 
>           following exception: it will copy *all* buckets in b
>           back to ctx->b before returning.
>         - mod_ssl: It would convert it from an AP_MODE_PEEK read
>           to a *readbytes (AP_IOBUFSIZE) and read the data 
>           normally.  In ssl_io_filter_Input, it will do as 
>           core_input_filter does and leave the data in ctx->b 
>           to be read again (and return it too).  Note that 
>           core_input_filter will *not* see this read as a PEEK, 
>           but as a normal ap_get_brigade call (*readbytes != 0).
>         - HTTP_IN: Gets out of its way and lets it be passed 
>           down without any verification on its part.  (Should
>           it be concerned about reading past the end-of-request??)
>         - xlate_in_filter: Can save the buffer in its context
>           for later reading.
> 11. Was any data read?
>    - If not, translate the buffer from ascii and return.
> 12. Is the first character a tab or space?
>    - If not, translate the buffer from ascii and return.
> 13. Go to step 1.

Since the PEEK function is returning data now, you will need to compare
the data read through the PEEK mode to the next chunk of data read to
be sure that they are identical.

Other than that, it looks good.


Ryan Bloom
Covalent Technologies

View raw message