httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rici Lake <r...@ricilake.net>
Subject Re: AP_MODE_EATCRLF considered indigestible
Date Fri, 15 Apr 2005 15:16:29 GMT

On 15-Apr-05, at 4:55 AM, Nick Kew wrote:

> Rici Lake wrote:
>> I was taking a look at the implementation of the renamed (but still
>> misleading) AP_MODE_EATCRLF, in an attempt to figure out what a
>> conforming input filter ought to do if it's handed that mode.
>
> I think that really depends on your filter.  I have two strategies
> there (setting aside simply ignoring the mode, which is wrong but
> works if you know where the filter's being used).
>
> * If the filter is happy to work with lines, get data from the next
>   filter using the same mode and process it that way.
> * Otherwise, assume the filter is unwanted, and remove itself from
>   the filter chain.

Perhaps I wasn't being clear. AP_MODE_EATCRLF is not a renaming
of AP_MODE_GETLINE; it is a renaming of AP_MODE_PEEK. (The renaming
happened some three years ago in revision 92928, obviously I'm not
al tanto.) To be more precise, it's a renaming of what the behaviour
of AP_MODE_PEEK was.

The documented behaviour of AP_MODE_EATCRLF is:
/** The filter should implicitly eat any CRLF pairs that it sees. */

The actual behaviour (at least in the only implementation of it in the 
tree) is somewhat more complex:

1) Regardless of the readtype, it only reads non-blocking
2) Similar to MODE_SPECULATIVE, it retains data for subsequent reads 
(but see below)
3) It does not, however, ever return any data.
4) It returns APR_SUCCESS only if data is immediately available. In 
determining whether or not data is immediately available, it ignores 
([CR]?[LF})+ (but with a bug as noted in my first message)
5) If the only data immediately available is ignored line-endings as 
per (4), these line endings are not retained for subsequent reads. 
(Obviously, this cannot be relied upon to delete leading CR?LF's, since 
the scan only applies to currently available data.)

My argument is that this is extremely idiosyncratic behaviour and has 
no place in a reasonable interface design. It seems to me that the only 
consumer of AP_MODE_EATCRLF could just as well use AP_MODE_SPECULATIVE 
(which is implemented by the only provider which implements 
AP_MODE_EATCRLF) to do what it wants to do. Consequently, 
AP_MODE_EATCRLF could just be honourably dispatched to the retirement 
home for awkward interfaces, and imho we'd all be better off.

The following (as-yet-untested) code should do the trick. It is 
somewhat longer than the existing check_pipeline_flush but that needs 
to be balanced against the removal of an equal-sized chunk of 
server/core_filters.c and the end of AP_MODE_EATCRLF:

static void check_pipeline_flush(request rec *r)
{
     apr_bucket *e
     apr_bucket_brigade *bb;
     conn_rec *c = r->connection;
     bb = apr_brigade_create(r->pool, c->bucket_alloc);
     if (c->keepalive != AP_CONN_CLOSE) {
         /* The 8 is entirely arbitrary. 4 is probably sufficient */
         if (ap_get_brigade(r->input_filters, bb, AP_MODE_SPECULATIVE,
                            APR_NONBLOCK_READ, 8) == APR_SUCCESS) {
             const char *buf;
             apr_size_t len, i;
             int gotcr = 0;
             int gotdata = 0;
             for (e = APR_BRIGADE_FIRST(bb);
                  e != APR_BRIGADE_SENTINEL(bb);
                  e == APR_BUCKET_NEXT(e)) {
                  if (APR_BUCKET_IS_METADATA(e)
                      || apr_bucket_read(e, &buf, &len,
                                      APR_NONBLOCK_READ) != APR_SUCCESS) 
{
                      break;
                  }
                  for (i = 0; i < len; ++i) {
                      if (buf[i] == AP_ASCII_LF)
                          gotcr = 0;
                      else if (gotcr || buf[i] != AP_ASCII_CR) {
                          gotdata = 1;
                          break;
                      }
                      else
                          gotcr = 1;
                  }
                  if (gotdata) break;
             }
             if (gotdata) {
                 c->data_in_input_filters = 1;
                 apr_brigade_destroy(bb); /* possibly unnecessary */
                 return;
             }
         }
     }
     c->data_in_input_filters = 0;
     e = apr_bucket_flush_create(c->bucket_alloc);
     APR_BRIGADE_INSERT_HEAD(bb, e);
     ap_pass_brigade(c->output_filters, bb);
}

> Agreed, with the proviso that it doesn't really complicate things that
> much, as a filter can uninsert itself.

If the read mode is line-oriented, the filter really should handle it 
rather than uninserting itself, unless the filter believes itself to be 
filtering non-textual information.

> Maybe we should have a declarative version of this, in the manner of
> mod_filter's protocol handling for output filters.  So that any input
> filter passes an OR of input modes it works in to util_filter, and
> never gets inserted when called in a mode it doesn't want to support?

Perhaps, but you don't know at the point of insertion what read mode 
will be used. I would prefer to be forced to handle all read modes on 
the understanding that these modes are all reasonable, documented and 
that there are not too many of them :)

Suppose that GETLINE was not implemented by a filter but that 
SPECULATIVE was. In that case, the consumer could implement GETLINE 
itself by doing a speculative get_brigade of the maximum linelength; 
finding the line ending if any, and then doing a normal get_brigade of 
the number of bytes to the line-ending (thereby leaving the remaining 
bytes in the downstream filter's speculative setaside buffer).

It would probably be desirable to avoid doing a blocking read beyond 
the line-ending, so this could be improved by doing a non-blocking 
speculative get_brigade and then, if no line-end is detected doing a 
blocking get_brigade of one more than the number of bytes returned by 
the non-blocking speculative get_brigade (and then repeat as 
necessary).

That could all be done by a provided utility function. GETLINE mode is 
really just an optimisation around the setaside in the above algorithm, 
so it could be made 100% optional for an input filter; the getline 
utility could try the GETLINE mode get_brigade and, if it received a 
not-implemented indication from the filter, fall back to the above 
algorithm; the caller of the getline utility could remain blissfully 
unaware and the only complexity added to the implementer of a naive 
input filter would be returning NOTIMPL if it gets a mode which it 
can't handle.

I presume from the comment in the definition of SPECULATIVE mode that 
something like this was always in the plans.


Mime
View raw message