httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Thoughts on filter-chain composition
Date Tue, 12 Sep 2000 04:25:46 GMT

In a message dated 00-09-12 02:24:19 EDT, Tony Finch writes...

>> TOKILEY wrote...
>> The successfull filtering of the content does not simply depend
>> on MIME type in and MIME type out. It all depends on what is
>> actually INSIDE the object itself.
> Hmm, yes, I did miss a big point there. Try writing more 
> concisely so that it's easier to spot the part of your 
> message that has something new to say :-) 

Thanks for the advice, I will. 

Try reading more closely so that you don't miss the
parts of a message that have something new to say :)

The following is from the very first paragraph of the 
message you responded to...

> ...there will always be cases where the morphing process 
> begins and the actual nature of the output of the filter
> can only be determined during the examination
> of the inbound data stream itself...

A little farther down ( same message )...

> 2. No conversion layer can really ever be sure what is going to
> happen before it or if it will even be needed. That all gets 
> determined WHILE the process is happening.

A little farther down ( same message )...

> ...and that's only GIF. It's all DIFFERENT again for .PNG
> .JPG, .XBM, etc. They ALL have to be converted and the steps 
> involved are all different based on what's in the file itself.

Other points...

> I think your suggested way of going about this is completely wrong. 

Roger that. 

>  It would probably be easier to do on-the-fly image conversion without
>  trying to use a one-to-one map from NetPBM filters to Apache filters;

Not only easier... necessary. The way you are actually designing
your filtering there can NEVER be a one-to-one mapping with
other useful code like NetPBM. There will always have to be
Apache-style filter wrappers written for anyone to use such code.

The NetPBM thing is just an EXAMPLE. It works for me because
I made it work. It was what I needed. It won't ever work with the 
current Apache filtering design approach without major changes
being made to entire NetPBM source code. Pity.

The only reason I was even mentioning it was to make some
reasonable argument that static-only filter ordering is just 
too limiting to even be really useful.

>  instead there would be one Apache filter for generic image filtering
>  which initially looks at the image header to decide how to set up the
>  NetPBM filters, and after that forms an encapsulation around all the
>  NetPBM processing that makes it look like a single-stage process to
>  Apache. I.e. one filter that does one big thing well, rather lots of
>  small filters that each do one small thing well.

Looks good on paper... but you are again just pushing a little
more of the big toe into the water and hoping you won't have
to dive in. Just going one little step beyond relying on request->rec
pre-content handling metadata to tell you all you want to know
and thinking that a quick peek at a graphics header before having
to call any write_brigade stuff will solve all the 'who does what when'
issues is still a little unrealistic.

There are things about processing images ( like .GIF ) that 
aren't in the header(s). They only 'pop up' as the processing
goes along ( GIF97a extensions, comments, directives, crap
like that ) and can change what is supposed to happen next.
Especially on Interlaced graphics or multi-frame rotating
GIFS, yada, yada.

Sure... having one big-ass graphics filter that is supposed 
to do everything will absolutely WORK and makes the
filtering engine look simpler... but look at what you are
losing by doing that. No one else can use any of the
really cool sub-filters but that one big-ass filter and,
likewise, the big-ass filter has no access to other 
existing filters that might be able to help it get 
the job done.

What YOU are saying is that the filtering engine in Apache
should (must) be able to say... 'You are the 'image' filter for this
MIME type... do your thing and leave me alone and don't 
ask me to insert any other mainline filters for you once 
the ball is rolling."

What I am saying is that you should consider that at any 
moment the filter you 'passed off' to will come right back at
you and say "I have looked at what you gave me and I need
you to install these additional filters into the mailine chain
for me in the exact order I tell you."

This applies to ANY filter... not just graphics stuff.

>  Note that this discussion is at a different level from the one about
>  content-encoding: all the metadata for that decision is available
>  before the request handler starts (so my earlier assertions remain
>  true in that case), 

Actually... not so. Even with Content-Encoding style
filtering there are times when the pre-content handling information
isn't all you need to know. You may still need/want to install
other filters into the chain to help get the job done AFTER the
processing has started.

Example: The next version of ZLIB will allow multiple packed
compression images all in the same compression object 
( sort of like a 'collage' of compression objects ) and is styled
after multi-part MIME files or Multi-frame GIF images. If 
a filter needs to 'unpack' this stuff on the fly it might need
different 'filters' to do it and it won't know what it needs
until it reaches the right point in the stream to know.

Sure... one big-ass ZLIB filter would take care of that, too,
but the sub-filters won't be usable by anyone else.

And that's just ZLIB (GZIP). Content-encoding schemes
are easily registered and God knows what the next 
scheme will need. Might not have a header at ALL.

>  whereas for your image filter part of the metadata
>  is the header of the image file and that file isn't opened until the
>  request handler starts. This is a feature that it shares with the
>  output of CGIs -- they also produce metadata "late" which may require
>  altering the filter stack (canonical example: to add SSI processing).

Exactly. You just re-iterated the only point I was trying to make.
Sometimes the totally static 'set it all up beforehand' approach
just isn't going to work.

>  We've also talked in the past about filtering headers, which magnifies
>  the problem: as the filters change the headers the filter stack must
>  change accordingly.

Yep. Not pretty but there it is. The smarts to do this is going
to have to be there sooner or later.
>  Now that I've spilled worms all over the place I'd like to put the lid
>  back on the can, but I don't know how to do it properly. I'm quite
>  strongly inclined to be conservative for 2.0 and avoid the header
>  filtering problem as much as possible. I.e. the makeup of the filter
>  stack is decided purely on the basis of HTTP metadata (i.e. we define
>  the problem so that we can ignore the headers of image files etc.) and
>  as much as possible before the content handler runs. 

This will WORK. No one ever said it wouldn't.

It will get you into the ballgame, at least,
but it will only work for limited filtering scenarios.

>  In order to support CGI -> SSI etc. the content handler can tweak 
>  the filter stack
>  late, after the content handler starts but before any calls to
>  ap_pass_brigade, but the filters themselves may not change the
>  headers. I think that is enough functionality to fulfill the main
>  goals of 2.0 and to give us the practical experience of filters that
>  we need in order to design a complete system for filtering metadata
>  that isn't a dog's dinner.

Just don't do anything to PREVENT the addition of dynamic
filter insertion sometime down the road and you will be AOK.

Otherwise you will be discussing ditching what you have later
on in favor of something new just like you are now ready to
ditch BUFF.

>  The main problem with this plan is that there must be some special
>  allowances made for implementation -> network charset filters. This
>  code's sole purpose in life is to filter metadata (headers and chunk
>  tags) so it would seem to be excluded. However it doesn't change the
>  meaning of the data so it can be wedged in with the aforementioned
>  special allowances. We can worry about finding a properly orthogonal
>  approach later.

That will all get worked out if/when you actually tackle transfer filters.
Right now it's just all about content conversion(s).
Kevin Kiley
CTO, Remote Communications, Inc. - Online Internet Content Compression Server

View raw message