httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Kew <>
Subject Re: Ideas for Smart Filtering
Date Sun, 01 Aug 2004 07:24:21 GMT
On Sat, 31 Jul 2004, Justin Erenkrantz wrote:

> Yet, I'm not sure I understand the intent of your proposal.  Is it that you
> don't like the fact that each filter has to make a decision on whether it
> should stick around?

Essentially, yes.  We already have a double-digit number of content
filters in one application, and that's growing.

>	  So, what you are proposing to do is to abstract those
> two decisions into separate functions - i.e. decide whether to accept, and
> another to perform the filter?

I'm currently running something like this using the ap_provider API.
The reason I'm not proposing just to use that is that we want more
flexibility.  For example "Content-Type: text/html;charset=latin1"
is two different keys we might wish to dispatch on, while a Cookie
could enumerate an arbitrary number.

> > ap_register_smart_filter(name, match, filter_func, ctx, protocol_flags)
> >
> > Now when the harness name is inserted in the filter chain, and there is a
> > match with match, lookup_handler (referenced above) will returh our
> > filter_func for filter name.
> I'm not sure what 'match' is in this context.

In the above case, it could be "text/html" or "latin1".
  ap_register_smart_filter("transcode", "latin1", charset_filter, ctx, flags);
  ap_register_smart_filter("process", "text/html", html_filter, ctx, flags);

But that really needs the flexibility of a regexp, so "latin1" becomes
or might expand to include other close relatives like iso-8859-15

> What is the point of protocol_flags?

C.f. the recent thread on handling byteranges.   Bill Rowe expressed
the problem rather well in that thread.  In view of your request not
to cite URLs for substantive discussion, I'll quote from his post:

The confusion results because mod_proxy isn't implemented as a content
handler, it's a protocol handler in its own right.  Rather than insist on
the mod_http <> mod_proxy agreeing to streamline the response, we've put
it on every content module author to:

. remove output C-L header if the size is transformed
. remove input range headers if the content isn't 1:1 transformed

This is very kludgy and more an example of where mod_http <> mod_proxy
didn't quite get it right, and made it a little more difficult for folks
who are just trying to transform content bodies.

It would be nice in apache 2.2 to finally clean up this contract, with two
simple metadata element to pass through the filter chain:

. this request is unfiltered
. this request has a 1:1 filter (stateless)
. this request has a arbitrary content transformation

>	  Why should the filter be forced to
> pre-declare these decisions?  Why can't I determine that dynamically?

Noone forces it.  A filter that wants to take charge of protocol decisions
is free to do so.  But requiring every filter to do so is a burden on
filter writers, and is bug-prone (c.f. the number of ways to generate
a bogus Content-Length on a HEAD request).

>	I
> think it'd be a bad idea to key such HTTP/1.1 protocol issues in the filter
> API.  I think we should maintain protocol-agnosticism where possible.

I have to disagree there.  There are certain wheels I don't want to have
to redesign every time I implement a content filter.  A filter that wants
to take full responsibility itself should be able to do so, but bearing
in mind that whatever one filter does may be overridden by another.

> So, to sum up: splitting out the decision whether the filter should run from
> it's filter function sounds fine.  But, I think the Filter* directives
> abstract too much in this particular case.  Let the filter itself decide.  --
> justin

You're right that these are two separable tasks, and in fact the filter
dispatcher is the part I have implemented, whereas the protocol handling
is merely a proposal.  I'd be interested to hear other views on the
subject.  Are you disagreeing with my quote from Bill Rowe above,
or merely with my proposed solution to that problem?

Nick Kew

View raw message