httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Erenkrantz <jus...@erenkrantz.com>
Subject Re: Ideas for Smart Filtering
Date Sun, 01 Aug 2004 09:18:40 GMT
--On Sunday, August 1, 2004 8:24 AM +0100 Nick Kew <nick@webthing.com> wrote:

>> I'm not sure what 'match' is in this context.
>
> In the above case, it could be "text/html" or "latin1".
>   ap_register_smart_filter("transcode", "latin1", charset_filter, ctx,
> flags);   ap_register_smart_filter("process", "text/html", html_filter, ctx,
> flags);
>
> But that really needs the flexibility of a regexp, so "latin1" becomes
>   "latin[-_]?1|iso[-_]?8859_?1"
> or might expand to include other close relatives like iso-8859-15

Having an overhead of regexp's by default in our filter code would seem to be 
a severe bottleneck.  I'd rather avoid that or push it on those few specific 
modules that want the power of regexp and willing to pay the ridiculous cost 
penalties.  The other significant thing you are missing in your API is what to 
match against.  (I think you are assuming Content-Type, but there's a lot of 
cases where you want to match against something other than Content-Type.)

OtherBill said:
> It would be nice in apache 2.2 to finally clean up this contract, with two
> simple metadata element to pass through the filter chain:
>
> . this request is unfiltered
> . this request has a 1:1 filter (stateless)
> . this request has a arbitrary content transformation

Remember that the content-length doesn't even need to be set *before* we go 
into the filter.  (The fact that default_handler does it is more of an 
accident than anything else.)  The content-length header is *not* normative 
and should almost always be ignored.  (Of course, this is internally to httpd 
- the client, if it sees one, it *is* normative.)  The definitive and 
authoritative size is what makes it through the filters in the form of buckets 
and brigades.  It is not efficient to constantly compute the length as we push 
data through the filters.

So, if a filter is relying upon the content-length HTTP metadata header and 
not the brigades it sees, then it's severely broken.  Trying to restrict 
filters to pre-declare what they will do is, IMHO, silly and pointless.  I 
don't see how a solution for pre-declaring the intention of a filter is going 
to provide any real benefits.  Nothing can make use of that knowledge anyway 
because they have to account for all cases!  So, any benefit for corner-case 
optimization is lost by the increase in complexity just added.

Nick said:
> You're right that these are two separable tasks, and in fact the filter
> dispatcher is the part I have implemented, whereas the protocol handling
> is merely a proposal.  I'd be interested to hear other views on the
> subject.  Are you disagreeing with my quote from Bill Rowe above,
> or merely with my proposed solution to that problem?

More with OtherBill's comment, I guess.  ;-)

I don't necessarily have a problem with the dispatch - although I think your 
proposal skips over a necessary level of abstraction.  A separate function 
would be better, IMHO.  A possible set of helper 'init' functions on top of 
that layer would satisfy those who want to do common some operations in their 
init functions.  But, the real key here is to provide a way for the filter to 
remove itself disjoint from the delivery function.  Then, the helper init 
functions (like your proposal suggests) are fairly trivial to write once that 
is in place and can be provided by something outside of the httpd filter API.

My $.02.  -- justin

Mime
View raw message