Return-Path: Delivered-To: apmail-new-httpd-archive@apache.org Received: (qmail 26676 invoked by uid 500); 8 Sep 2000 17:44:44 -0000 Mailing-List: contact new-httpd-help@apache.org; run by ezmlm Precedence: bulk X-No-Archive: yes Reply-To: new-httpd@apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list new-httpd@apache.org Delivered-To: moderator for new-httpd@apache.org Received: (qmail 98686 invoked from network); 7 Sep 2000 14:30:57 -0000 To: new-httpd@apache.org Subject: Re: [PATCH] filtering and canned error responses References: From: Jeff Trawick Date: 07 Sep 2000 10:30:45 -0400 In-Reply-To: rbb@covalent.net's message of "Sat, 2 Sep 2000 08:35:20 -0700 (PDT)" Message-ID: Lines: 126 X-Mailer: Gnus v5.5/Emacs 20.3 X-Spam-Rating: locus.apache.org 1.6.2 0/1000/N rbb@covalent.net writes: > > I disagree completely with the premise that all filters act on tags in > > a fashion similar to mod_include. That is a debilitating requirement. > > Certain filters can work that way and certain ones can't. It is > > better to disallow filtering on the error strings than to require that > > filters sanity-check their input data (not all filters can even do > > that). (Besides, the less code between an error message and the > > network the better... Sending an error message should be a relatively > > simple.) > > But who's to say that we don't want all filters acting on those errors. I > see a few cases, filters that add data to all output, filters that add > data on tags, and filters that modify some characteristic of the > data. > > Tags are not an issue in this case, because we all agree that any > errors Apache generates won't have tags. What is a tag? That is for the filter to decide (if indeed the filter's processing is tag-based). > Adding data to all output (I'm thinking header and trailer documents that > identify the site, or add a logo, or something like that) may want to be > used on error documents. > > Filters that modify the data (I'm thinking charset or automatic langauge > translation) are very useful, especially on error pages. These aren't > connection filters, they are content filters. True. That is an area where we have breakage now. Suppose the requested object is /cgi-bin/1047/gobble. mod_charset_lite has been told to translate text/* output in /cgi-bin/1047/ from IBM1047 to 8859-1. Apache determines that there is no such URI and generates an error message. That error message is then translated by mod_charset_lite according to the information that was available prior to when we realized that /cgi-bin/1047/gobble didn't exist. The result is a meaningless translation and an unreadable canned error message. When a redirection occurs (i.e., the administrator has specified ErrorDocument 404 /errors/404.html), then modules will be given information about /errors/404.html and the set of filters will be built appropriately and the configuration used by the filters will be the appropriate configuration for /errors/404.html (not the configuration for /cgi-bin/1047/gobble). When a redirection does not occur (no ErrorDocument directive for the error or an ErrorDocument directive which specifies the error message directly), how do we avoid a filter using the wrong configuration directives? (Not that there are any configuration directives for such a canned error messages.) Recap: redirection: magic happens; modules know what is going on no redirection: potential GIGO, 'cause filters don't know what to do Some filters only act on tags, and if they aren't present they'll pass the data on through with no breakage. Some filters do not act on tags; all they know to do is to perform a certain transformation on every byte of input. Some problems: 1) If we're on an EBCDIC machine, how do we translate canned error messages (i.e., no ErrorDocument-proscribed redirection) from the code page of the source code to 8859-1? This is similar to another problem -- translating HTTP header fields and chunk headers/trailers from the code page of the source code to 8859-1, so it can be solved in the same way (some indication in the bucket of what is going on?). 2) What if we absolutely must send canned error messages through certain required filters? One example is language translation, which you mentioned above. Another example is an HTML->WML translator (not that I know how a module would know when to install that filter :) ). I don't know. Mandatory redirection isn't cool because you still need the simple fallback method of generating the message when the redirection fails. > > Of the three types I can see for filters, one isn't an issue, and the > other two are may want to filter the error doc. > > > If the administrator wants filter processing on an error document she > > can use a redirect, so there is no big loss here. > > But why bother with this. What is the problem we are trying to solve with > this? To me, it seems very straightforward that the error docs are put > into a bucket brigade with an EOS, because know that the doc is done, and > we send it down the filter stack. This doesn't try to avoid the filter > logic we already have and it should work. Where is the issue? > > > I'm not sure I understand "canned errors generated by mistakes in > > SSIs." Are you talking about a failed subrequest? > > Take a look at mod_include. We return the string "[an error occurred > while processing this directive]" whenever we encounter an error in > processing an SSI tag. We don't want to send those out without filtering > them. Also, how does this affect sub-requests? If I have data that is > being buffered by some filter, and a filter higher-up sends a canned-error > response, won't the output be all messed up, because it won't wait for the > data sitting in the lower filter? The mod_include case is different than the cases affected by my patch. In the mod_include case you refer to, we are delivering the URI which was requested. There may be the occasional mod_include error message, but any filters in the chain after mod_include are processing the output that they expect (according to their hooks which were called prior to the content handler). -- Jeff Trawick | trawick@ibm.net | PGP public key at web site: http://www.geocities.com/SiliconValley/Park/9289/ Born in Roswell... married an alien...