Return-Path: Delivered-To: apmail-httpd-dev-archive@www.apache.org Received: (qmail 91516 invoked from network); 12 Nov 2003 09:42:19 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 12 Nov 2003 09:42:19 -0000 Received: (qmail 81794 invoked by uid 500); 12 Nov 2003 09:41:48 -0000 Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 81757 invoked by uid 500); 12 Nov 2003 09:41:48 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 81744 invoked from network); 12 Nov 2003 09:41:48 -0000 Received: from unknown (HELO imo-m03.mx.aol.com) (64.12.136.6) by daedalus.apache.org with SMTP; 12 Nov 2003 09:41:48 -0000 Received: from TOKILEY@aol.com by imo-m03.mx.aol.com (mail_out_v36_r1.1.) id e.3f.24bd7c30 (30950); Wed, 12 Nov 2003 04:41:58 -0500 (EST) From: TOKILEY@aol.com Message-ID: <3f.24bd7c30.2ce35a65@aol.com> Date: Wed, 12 Nov 2003 04:41:57 EST Subject: Re: mod_deflate and transfer / content encoding problem To: dev@httpd.apache.org CC: TOKILEY@aol.com MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: 7.0 for Windows sub 10708 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N >> Andre Schild wrote: >> >>>>>joshua@slive.ca 31.10.2003 23:44:06 >>> >>> >>>On Fri, 31 Oct 2003, Andre Schild wrote: >>> >>>>Please have a look at the following Mozilla bug report >>>> >>>>http://bugzilla.mozilla.org/show_bug.cgi?id=224296 >>>> >>>>It seems that mod_deflate does transfer encoding, >>>>but sets the headers as if doing content encoding. >>> >>>I'm not an expert in this, but that statement seems completely wrong to >>>me. Compressing the content is a content-encoding, not a >>>transfer-encoding. The only thing transfer-encoding is used for in HTTP >>>is chunking. >> >> I anyone here reading this who can answer this for sure ? > > Bill Stoddard replied... > >Compression is content-encoding not transfer-encoding. See RFC 2616 >14.11, 14.41 & 3.6 Nope. Totally wrong. Read 3.6 again ( the whole thing ). 'Compression' is most certainly a valid 'Transfer-Encoding'. [snip - From section 3.6 of RFC 2616 ] The Internet Assigned Numbers Authority (IANA) acts as a registry for transfer-coding value tokens. Initially, the registry contains the following tokens: "chunked" (section 3.6.1), "identity" (section 3.6.2), "gzip" (section 3.5), "compress" (section 3.5), and "deflate" (section 3.5). [snip] There are actually some clients ( and servers ) out there that are RFC 2616 compliant and can handle "Transfer-encoding: gzip, chunked" perfectly well. Apache cannot. >> I think we should put a warning on the second "recomended configuration" >> that compressing everything can cause problems. (Specially with PDF files) >> > >Yep. > >IE in particular has some really weird problems handling compressed SSL >data streams containing JavaScript. > >Bill Netscape is by far the worst when it comes to not being able to handle certain 'compressed' mime types even though it sends the all-or-nothing indicator "Accept-encoding: gzip". Netscape will even screw up a compressed .css style sheet. Opera is the 'most capable' browser in this regard... When Opera says it can "Accept-encoding: gzip" it comes the closest to not being a 'liar". It can (almost) do anything. MSIE falls somewhere in-between. There is no browser on the planet that is telling the truth when it says "Accept-Encoding: gzip" when you consider that this is an ALL OR NOTHING DEVCAPS ( Device Capbilites ) indicator. Unlike the "Accept:" field, for mime types, there isn't even any way for a user-agent to indicate WHICH mime types it will be able to 'decompress' so unless it can absolutely handle a compressed version of ANY mime type then it really has no business sending "Accept-encoding: gzip" at all. There also is no way, in current HTTP specs, for a Server to distinguish between "Content-encoding" and "Transfer-encoding" as far as what the client really means it can/can't do. When a User-Agent says "Accept-encoding: xxxx" a Server can only assume that it means it can handle the 'xxxx' encoding for EITHER "Content-encoding:" OR "Transfer-encoding:". There's just no way to tell the difference. Same HTTP request field is supposed to be good for BOTH 'Transfer' and 'Content' encoding. Again... if a User Agent cannnot handle 'xxxx' encodings for all mime types for BOTH "Content-encoding:" AND "Transfer-encoding:" then it has no business ever sending the "Accept-Encoding: xxxx in the first place. This is of course, far from reality. As far as a User-Agent being able to tell the difference between "Content-Encoding:" and "Transfer-encoding:" coming from the Server then the only thing you can rely on is the response header itself. The 'compressed' BODY data itself will always be identical regardless of whether it's actually... Content-Encoding: gzip together with Transfer-Encoding: chunked or simply... Transfer-Encoding: gzip, chunked. It should make no difference to the 'receiver'. The User-Agent will still be receiving a gzip compression stream with chunking bytes injected into it and the decompression step is the same for both scenarios. You have to dechunk, then decompress. If you pass the stream data to a ZLIB decompression routine without first removing the chunking bytes then ZLIB is going to blow sky-high. This is the reason most browsers are screwing up. They are actually passing the response stream off to an embedded mime-handler ( like an Adobe plug-in for PDF files or some ill-coded Javascript engine, etc. ) that has no idea how to remove the chunking bytes AND get it decompressed. The only way for all of this to currently work 'out there' is to never rely on the "Accpet-Encoding:" field at all and always have a 'list' of exclusion mime-types that can be tied to 'User-Agent' since they are all different as far as what they are able to do ( or not do ). It's still a classic 'DEVCAPS' ( Device Capabilities ) issue and the current HTTP headers just don't solve the problem(s). You can't really trust "User-Agent" much. Not sure it's ever been trustworthy. A scrape-bot might be imitating Netscape request headers right down to the "Accept-Encoding:" field so it doesn't get blocked out but if you actually send it compressed data it's probably going to puke. Most search-bots, scrape-bots are simply imitating Netscape browsers but they can, by no means, actually "Accept-encoding" ( other than chunked ) and there's really no way to know that it's lying. This is all certainly not news to the developers on this forum but I thought the fellow who first asked the question might want to see some of the issues involved. Yours... Kevin Kiley