httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Thoughts on filter-chain composition
Date Mon, 11 Sep 2000 19:41:52 GMT

In a message dated 00-09-11 17:23:33 EDT, Tony Finch writes...

> Rodent of Unusual Size <Ken.Coar@Golux.Com> wrote:
>  > wrote:
>  >> 
>  >> Would it be up to each and every filter to check for the existence
>  >> of the 'Content-encoding' field when the buckets start showing up
>  >> and then do its own response field checking to see if it knows how
>  >> to handle whatever that might be?
>  >
>  >I believe so, yes.
>  Hmm, I think if the filter can't handle the data stream then it
>  shouldn't be there in the first place. 

How does any filter really know what is going to happen before it?
It's a runtime thing. Sure... some filtering is so unbelievably simple
that it can either 'be there or be square' but there will always be cases
where the morphing process begins and the actual nature of the output
of the filter can only be determined during the examination
of the inbound data stream itself.

> However I agree that this is
> hard for the code to predict in advance, so ...

Not only hard... in most real-world useful filtering implementations
it is going to be impossible for a pre-determined 'static' chain
to get the job done right. See below.
>  >> We are right back to the slippery slopes of who's responsible
>  >> for tracking all filter 'dependencies' again. Does the filtering
>  >> engine have any responsibility or is the onus always going
>  >> to be on the filter write to check for ALL possible 'no can do'
>  >> scenarios?
>  >
>  >I think the latter.  That's modular and extensible.
>  ... I think it should be the responsibility of the server
>  administrator to configure it properly so that the problem
>  doesn't occur.
>  Tony.

If the only filtering possibilities are for KFIKFO ( Known Format
In, Known Format Out ) just like the NDIS ODI protocol stack(s)
then sure, this is fine. What admin would ever just 'expect' the
whole TCP/IP IP LAN NDIS ODI stack to work unless all the
right drivers are all loaded in the right order and pointing at
each other in the right way. Has to be right, and has to be
a known path of content formats travelling in a fixed direction
or it all falls down. 

However... filtering content is not like translating protocols.

Things get a bit more schizophrenic as the content gets
'changed' from one filter to another. Downstream filters
might have to be just a little 'smarter' than some dumb
NDIS ODI layer code that is just pushing packets around
without caring what's really inside them.

Best example I can think of is a filter that actually works
right now and is VERY necessary for any next generation
HTTP Server...

The dynamic filtering of graphics content for different User Agents.

I have a version of Apache that does that very thing and 
let me tell you how it all had to come out in the wash
and perhaps you will see that a fixed CONFIG file might
not do the whole trick....

Prime Directive: Examine the User-Agent field and look
for the word 'Elaine' in it. If you see that word in the User-Agent
field along with the standard 'Mozilla x/x' User-Agent Netscape
fakery ( common practice these days to imitate Netscape User
Agent field ) then you know the request comes from any one
of about 4 MILLION Palm Pilots with a wireless modem making
requests for psuedo-html via Palm Computing's Proxy Server.

You also now know that you must either convert all graphics
data to 4-bit grayscale or it cannot be safely returned to the
User-Agent 'Elaine'. If you send any graphics other than the
4 bit grayscale the Palm Pilot could crash badly and may only
be able to be reset by shoving the old paper clip into the hole
in the back and then seeing if you have lost all your bookmarks,
phone numbers, addresses, and stored memos. Not good.

Simple, right? Just convert all graphics to 4-bit grayscale using
freely available NetPBM libraries running as real time 'filters' hooked
into the HTTP request fulfillment stream(s).

Wrong. Not simple. Here's why.

1. The only sane way to do it is break up ALL of the different
generic tasks involved into separate 'filters' and have them
be able to be inserted or re-ordered 'on the fly' by any other
filter and on an 'as needed' basis. The filtering engine itself
can/should HELP the filters pull this off, or at least not
PREVENT them from doing so.

2. No conversion layer can really ever be sure what is going to
happen before it or if it will even be needed. That all gets 
determined WHILE the process is happening.

3. Any single filter might have to perform any one of a 
number of 'passes' at itself in an order and a fashion that
can only be determined after the process has begun.
The number of passes and the actual ORDER can be
different for each type of graphics file and each filter
must have the ability to set the iterations AND the order
in which they are performed.

A practical example... ( requested object is assumed to be on
the same Apache Server that can do the filtering... )

- An 'Elaine' ( Palm Pilot ) User-Agent makes a request for
a standard 256 color .GIF file.
- Server locates the file on the local hard drive and sets it
up as the object to be delivered via 'sendfile()'.
- Primary graphics morphing filter kicks in and, lo and behold,
discovers that the .GIF file itself is already in 4 bit grayscale.
Nothing needs to be done. Any hard-coded graphics conversion
filtering chain can now be skipped and object sent 'as is'.
- Primary graphics morphing filter kicks in and sees that it
has work to do to get this puppy into 4-bit color format.
- Primary filter then has to determine if it's an Interlaced
.GIF file or not and, if so, the whole filtering chain will 
look 'different'. NetPBM uses different 'passes' at the file
to convert the interlacing into .PBM Portable Bitmap format
and all that now has to be done FIRST before any of the
'other' filters can even start their part of the conversion 
process. If the image is NOT Interlaced then only one
pre-pass to PBM working format has to be made but 
all this means is that you never really know which 
sub-filters are needed until you start working.
- Once the Interlaced/Non-Interlaced filtering passes
are completely done... only then does the real fun
begin since you must now make OTHER decisions
about which filters come next in the process. 
IF the image is 'too big' for the User-Agent then the
scaling filter must now take a pass, or not.
Depending on the age of the image you might have
to call one of a number of DIFFERENT LZ77 decompression
filters on the frame data itself.
Depending on the version of 'Elaine' you may or may
not have to call a 'GIF extension stripping filter' to take
out everything the target won't be able to deal with.
Somwhere in there the actual color conversion(s) 
take place and then it's on to another ( dynamically
determined ) set of 'filters' that are simply going to
sew it all back together again into the right GIF
output format for the requesting User-Agent.

And that's only GIF. It's all DIFFERENT again for .PNG
.JPG, .XBM, etc. They ALL have to be converted and the steps 
involved are all different based on what's in the file itself.

Sounds like a mess, I know, but as long as the 'filters'
themselves are able to determine what happens next... 
it all works in the blink of an eye and everything is fine.

I know that the 'quick answer' to this is... 'If NetPBM is so
layered then just don't use it because it doesn't fit into
a statically determined MIME type filtering scheme'.

Long answer is that ANY code that is trying to accomplish
the same content filtering will pretty much need to look 
exactly the same. There just really isn't any other 
practical way to do it other than the way NetPBM is
ALREADY doing it.

The ONLY reason I am going on about this is because
it's 'pucker' time as far as the design is concerned. It
doesn't matter if the first incarnations of Apache content
filtering are  actually ABLE to do anything this sophisticated 
right off the bat... but it would be a shame if it turns out 
that it never can because the design won't permit it.

"You can usually recover from production flaws"
"You can never fully recover from a flawed design."
Werner Von Braun - The man who took us to the moon.

Kevin Kiley
CTO, Remote Communications, Inc. - Online Internet Content Compression Server

View raw message