Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
From: TOKILEY@aol.com
Message-ID: <3f.24bd7c30.2ce35a65@aol.com>
Date: Wed, 12 Nov 2003 04:41:57 EST
Subject: Re: mod_deflate and transfer / content encoding problem
To: dev@httpd.apache.org
CC: TOKILEY@aol.com
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit


>> Andre Schild wrote:
>>
>>>>>joshua@slive.ca 31.10.2003 23:44:06 >>>
>>>
>>>On Fri, 31 Oct 2003, Andre Schild wrote:
>>>
>>>>Please have a look at the following Mozilla bug report
>>>>
>>>>http://bugzilla.mozilla.org/show_bug.cgi?id=224296
>>>>
>>>>It seems that mod_deflate does transfer encoding,
>>>>but sets the headers as if doing content encoding.
>>>
>>>I'm not an expert in this, but that statement seems completely wrong to
>>>me.  Compressing the content is a content-encoding, not a
>>>transfer-encoding.  The only thing transfer-encoding is used for in HTTP
>>>is chunking.
>>
>> I anyone here reading this who can answer this for sure ?
>
> Bill Stoddard replied...
>
>Compression is content-encoding not transfer-encoding. See RFC 2616
>14.11, 14.41 & 3.6

Nope. Totally wrong.

Read 3.6 again ( the whole thing ).

'Compression' is most certainly a valid 'Transfer-Encoding'.

[snip - From section 3.6 of RFC 2616 ]

The Internet Assigned Numbers Authority (IANA) acts as a registry for
transfer-coding value tokens. Initially, the registry contains the
following tokens: "chunked" (section 3.6.1), "identity" (section
3.6.2), "gzip" (section 3.5), "compress" (section 3.5), and "deflate"
(section 3.5).

[snip]

There are actually some clients ( and servers ) out there that
are RFC 2616 compliant and can handle
"Transfer-encoding: gzip, chunked" perfectly well.

Apache cannot.

>> I think we should put a warning on the second "recomended configuration"
>> that compressing everything can cause problems. (Specially with PDF files)
>>
>
>Yep.
>
>IE in particular has some really weird problems handling compressed SSL
>data streams containing JavaScript.
>
>Bill

Netscape is by far the worst when it comes to not being
able to handle certain 'compressed' mime types even though
it sends the all-or-nothing indicator "Accept-encoding: gzip".

Netscape will even screw up a compressed .css style sheet.

Opera is the 'most capable' browser in this regard...

When Opera says it can "Accept-encoding: gzip" it comes
the closest to not being a 'liar". It can (almost) do anything.

MSIE falls somewhere in-between.

There is no browser on the planet that is telling the
truth when it says "Accept-Encoding: gzip" when you
consider that this is an ALL OR NOTHING DEVCAPS ( Device
Capbilites ) indicator.

Unlike the "Accept:" field, for mime types, there isn't
even any way for a user-agent to indicate WHICH mime
types it will be able to 'decompress' so unless it can
absolutely handle a compressed version of ANY mime type
then it really has no business sending "Accept-encoding: gzip"
at all.

There also is no way, in current HTTP specs, for a Server
to distinguish between "Content-encoding" and "Transfer-encoding"
as far as what the client really means it can/can't do.

When a User-Agent says "Accept-encoding: xxxx" a Server can
only assume that it means it can handle the 'xxxx' encoding for
EITHER "Content-encoding:" OR "Transfer-encoding:". There's
just no way to tell the difference. Same HTTP request field
is supposed to be good for BOTH 'Transfer' and 'Content'
encoding.

Again... if a User Agent cannnot handle 'xxxx' encodings for
all mime types for BOTH "Content-encoding:" AND
"Transfer-encoding:" then it has no business ever sending
the "Accept-Encoding: xxxx in the first place.

This is of course, far from reality.

As far as a User-Agent being able to tell the difference
between "Content-Encoding:" and "Transfer-encoding:" coming
from the Server then the only thing you can rely on is
the response header itself.

The 'compressed' BODY data itself will always be
identical regardless of whether it's actually...

Content-Encoding: gzip
together with
Transfer-Encoding: chunked

or simply...

Transfer-Encoding: gzip, chunked.

It should make no difference to the 'receiver'.

The User-Agent will still be receiving a gzip
compression stream with chunking bytes injected
into it and the decompression step is the same
for both scenarios.

You have to dechunk, then decompress.

If you pass the stream data to a ZLIB decompression
routine without first removing the chunking bytes
then ZLIB is going to blow sky-high.

This is the reason most browsers are screwing up.

They are actually passing the response stream off
to an embedded mime-handler ( like an Adobe plug-in
for PDF files or some ill-coded Javascript engine, etc. ) 
that has no idea how to remove
the chunking bytes AND get it decompressed.

The only way for all of this to currently work 'out there'
is to never rely on the "Accpet-Encoding:" field at all
and always have a 'list' of exclusion mime-types that
can be tied to 'User-Agent' since they are all different
as far as what they are able to do ( or not do ).

It's still a classic 'DEVCAPS' ( Device Capabilities )
issue and the current HTTP headers just don't
solve the problem(s).

You can't really trust "User-Agent" much. 

Not sure it's ever been trustworthy.

A scrape-bot might be imitating Netscape request
headers right down to the "Accept-Encoding:" field so
it doesn't get blocked out but if you actually send
it compressed data it's probably going to puke.

Most search-bots, scrape-bots are simply imitating
Netscape browsers but they can, by no means, actually
"Accept-encoding" ( other than chunked ) and there's
really no way to know that it's lying.

This is all certainly not news to the developers on this 
forum but I thought the fellow who first asked the question 
might want to see some of the issues involved.

Yours...
Kevin Kiley