httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: [PATCH] ap_add_filter ( Example CE/TE filter )
Date Thu, 17 Aug 2000 18:55:44 GMT

In a message dated 00-08-17 17:25:44 EDT, Ryan writes...

>  > If a filter is now allowed ( as I think it should be )
>  > to insert ANOTHER filter immediately AFTER
>  > itself... where is the TYPE checking in 
>  > ap_add_filter() and where is the code that makes
>  > sure the mixing and matching of CONTEXT 
>  > versus TRANSPORT filter types stays straight?
>  There isn't any.  This is done by how Apache works in
>  general.  Basically filters should only be inserting content filters,
>  because that is all filters really know.  

Should and can are 2 different things. If the design spec says it
is possible to insert any type of filter ( CONTENT or TRANSPORT )
immediately after your own 'filter' and ap_add_filter() allows that to happen
without giving an error then you can be sure someone will try it.

> Transport filters are really HTTP specific filters, and they should only 
> be installed by Apache proper, which know how to install them in the 
> right space.  

So there will ( must? ) be a return code from ap_add_filter() which
says 'Sorry.... you can't do that' or 'Wrong place to do this'?
Right now there isn't.

>  I guess if you could give an example of a module wanting to install a 
>  filter that is a transport filter, that would help.  

I can do better than that.

I will give you an example of a module that might NEED to install
either one or the other, depending on what the headers say.

My own self-interest and my hopes for what Apache 2.0 will be
able to do are about to show here... so bear with me...

The support of IETF Content-Encoding out in the world at large
is an absolute mess. There are major releases of all major browsers
that do everything from adding 'Accept-Encoding: gzip, compress' 
when there isn't even one lick of code in them to support it to other
release versions withdrawing the header field when, in fact, full
support for the scheme is actually present.

To make matters worse... there are other major versions that
even though 'Accept-Encoding: gzip, compress' is sent it will
ONLY handle a compressed return stream as EITHER CE
( Content-encoding ) or TE ( Transfer-encoding ) but not both.

The problem is that the 'Accept-encoding: whatever'  field itself
makes no distinction between CE or TE and you are supposed
to assume that if a user agent say 'Accept-encoding: whatever'
that its SUPPOSED to be able to handle it as either CE or TE.

Reality just doesn't even come close to that.

There really needed to be ANOTHER header field that allows
a user-agent to specify whether it supports the Encoding 
scheme as ONLY CE or TE ( or both if it can ) but that's
a whole 'nother story. Don't even get me started on what
is missing from the HTTP protocol.

So a request arrives to a new Apache Server that, finally,
has SOME ability to satisfy the now 3 year old capability
of some user-agents with regards to dynamic CE or TE.

Apache taps a 'compression' hook filter module on the
shoulder and lets it look at the headers and decide whether
or not to call 'ap_add_filter()'.

At this point... it is perfectly possible that the filter module
might have to decide whether to install a CONTENT filter
or a TRANSPORT filter ( or both ). It will all be based on 
a very complicated algorithm that takes all the user-agent
information into account since it cannot rely on the IETF
RFC itself since that's all a total mess as far as real-world
implementation goes.

If Apache filtering gets limited to each hook filter having
to stick with either CONTENT filtering or TRANSPORT
filtering ( but never both in the same code ) then obviously
the CONTENT filter would just 'pass off' to the separate
TRANSPORT filter if the User-Agent can only handle 
compressed TE and not CE. Reverse is true... if this is
a browser that is lying about full Encoding support and
it can ONLY handle compressed CE but not TE then 
the CE filter 'grabs on' and tells the TE filter to 'back off'.

How does it do that? Easy... it makes sure the output
header immediately says 'Content-Encoding: whatever'
so by the time the TE is tapped on the shoulder it MUST
know enough to see the CE header and back off.

Actually... this gets even worse... but let's not go there 
right now.

Ok... so what about a filter that might need to install 
another one AFTER itself that MIGHT be either CONTENT

Again... same example... I have Internet enabled algos that
can take another 30 percent out of anything that's GZIPPED
so what if I want to LET the CE Content-Encoding filter do
the GZIP pass but I also know the User-Agent can handle
the additional 'boosted' post-GZIP compression but only
as TE?... I simply insert another filter after the GZIP one
which will add TE encoding in addition to the CE GZIP.

That means my CE filter has to be able to add a TE
filter right after itself... or at least the additional TE
filter needs to kick in somewhere down the road before
the data starts leaving the box.

Why not just do BOTH compression passes in the 
CONTENT filter and be done with it?

Again... reality reaches out and smacks one in the face.

Not only can some browsers not even support GZIP but even
some of the ones that do will ONLY support 1 kind of 
Content-Encoding and won't even swallow a return header
that has more than one entry listed in 'Content-Encoding: whatever'.

Again... this is totally contrary to RFC's but it's reality. If you
perform or send "Content-Encoding: gzip, gzip_boost" you will
usually get the blue screen of death immediately... but a combination
of 'Content-Encoding: gzip' and 'Transfer-Encoding: gzip_boost'
will work.

If one of the goals of adding filtering to Apache is to ( finally ) be
able to fully support IETF schemes then this is the kind of
reality you are up against in the CE, TE arena.

>  Remember, most transport filters are installed based on header values, 
>  so the filter should set the header, and the core then inserts the correct 
>  filter in the right spot based on the header.

Yes... sounds good... but will it?

If I register a CONTENT filter and a TRANSPORT filter at startup
and the spark for the TRANSPORT filter hit is nothing more than
what some output header value says then who polices that part
of the equation? Are you saying that a field like 'Transport-Encoding'
must always be 'appended' to like the 'Via:' field? What if some 
other Transport filter kicks in AFTER mine and doesn't pay any
attention to the fact that I have already added 
'Transfer-Encoding: gzip-boost' during my CONTENT pass.

If that 'filter' author just whacks the TE field and replaces
it with something else ( Like Transfer-Encoding: chunked )
without paying attention to what is already there 
then the second compression pass ( which was ALREADY performed ) 
disappears and the user-agent will surely crash since it doesn't know 
what to do with the data it receives.

Your own code calls 'apr_table_mergen()' which seems to be
the right thing to do but look what else you are doing without
paying attention to what has happened before you...

if (r->chunked) {
         apr_table_mergen(r->headers_out, "Transfer-Encoding", "chunked");
         apr_table_unset(r->headers_out, "Content-Length");
         ap_add_filter("CHUNK", NULL, r, NULL);

You are whacking the 'Content-Length:' field without checking
to see if something that has happened before you requires it
to stay there. Some things do.

Kevin Kiley
CTO, Remote Communications, Inc. - Online Internet Content Compression Server.

View raw message