httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TOKI...@aol.com
Subject Re: mod_deflate and transfer / content encoding problem
Date Thu, 13 Nov 2003 03:15:40 GMT
>My reading of RFC 2616 is that Accept-encoding is only for
>content-codings.

You are right. Brain fart on my part.

I am still not sure how the discussion about mod_deflate
has gotten anywhere near "Transfer-Encoding:".

mod_deflate is NOT DOING TRANSFER ENCODING.

Was it you that suggested it was or the original
fellow who started the thread?

Content-encoding: gzip
Transfer-encoding: chunked

Cannot be interpreted as 'using Transfer encoding'.

That would be...

Transfer-encoding: gzip, chunked.

Is someone saying that's what they are actually
SEEING coming out of Apache? God... I hope not.

Bug-city.

>Clients should indicate their ability to handle
>transfer-codings via TE.

Yep... except a Server may always ASSUME that a client
can handle Transfer-encoding if it says it's HTTP 1.1
compliant. There's no need for a TE header at all.

The only caveat is that you can only assume a TE
encoding/decodiing capability of "chunked".

Anything other than 'chunked' has to be indicated
with a TE header... you are right.

Problem here is that what I said about 'not knowing'
is still sort of true... it just didn't come out right.

A Server still has no way of knowing if the original
requestor can handle TE, or not.

The TE header is a 'hop by hop' header.
Might have come from the original requestor, might not.
There's no way to know.

And that's OK... that's all that TE: was designed for.
It's all based on the NN ( Nearest Neighbor )
concept and is a property of the 'message', not the 'content'.
It's just part of that strange mixture of transport AND
presentation layer concepts that is modern day HTTP.

Even if it shows up ( very rare ) the TE header is actually
SUPPOSED to be 'removed' by the NN ( Nearest Neighbor )...

[snip - From RFC 2616]

13.5.1 End-to-end and Hop-by-hop Headers

   For the purpose of defining the behavior of caches and non-caching
   proxies, we divide HTTP headers into two categories:

      - End-to-end headers, which are  transmitted to the ultimate
        recipient of a request or response. End-to-end headers in
        responses MUST be stored as part of a cache entry and MUST be
        transmitted in any response formed from a cache entry.

      - Hop-by-hop headers, which are meaningful only for a single
        transport-level connection, and are not stored by caches or
        forwarded by proxies.

   The following HTTP/1.1 headers are hop-by-hop headers:

      - Connection
      - Keep-Alive
      - Proxy-Authenticate
      - Proxy-Authorization
      - TE
      - Trailers
      - Transfer-Encoding
      - Upgrade

   All other headers defined by HTTP/1.1 are end-to-end headers.

[snip]

"Hop-by-hop headers, which are meaningful only for a single
transport-level connection, and are not stored by caches or
forwarded by proxies."

The above is, of course, not what is really going 'out there'
in the real world but it's all hit or miss and you still
can't be sure what's being 'forwarded' and what isn't.

I have certainly seen inline proxies 'forwarding' hop-by-hop
headers and 'caches' storing them, as well.

It's a jungle out there. ROFL.

If a Server really wants to be sure compressed content
is being sent all the way along the response chain
( including that critical 'last mile' ) to the original
requestor then the only choice is still to just
use 'Content-Encoding'... even if there is no
static representation or the page is totally
dynamic and doesn't even exist until it's asked for.

ASIDE: Maybe that's where someone is getting
confused about TE versus CE? When HTTP was
designed the whole CONTENT concept was
based on disk files and file extensions and
MIME types and whatnot but that's not how
things have evolved. "Content-type" is now
more a 'concept' than a physical reality.
There are gigabytes of "Content" these days
that doesn' even exist until someone asks
for it and it's NEVER represented on disk at all.

There's just no way to know if your 'last mile'
is covered with TE capability, or not.

The alternative to using the "Content-encoding:"
voodoo to get compressed representations of
non-compressed resources all the way down to a
client would be to have some sort of 'end-to-end'
TE header which says 'the content was compressed
because the original requestor says he wants it
that way so just pass it through'... but that
ain't gonna happen anytime soon.

>>Content-Encoding: gzip
>>together with
>>Transfer-Encoding: chunked
>>
>>or simply...
>>
>>Transfer-Encoding: gzip, chunked.
>>
>>It should make no difference to the 'receiver'.
>>
>
>Well, not if the receiver is a caching proxy...

Personally... I still don't think that matters
much. I am of the opinion that it's always
'cheaper' to store compressed versions of
entities and then just decompress it if you
need to which means a proxy SHOULD just go ahead
and remove the chunking and stash the response
regardless of whether it came in as 'Content-encoded'
compression or 'Transfer-encoded' compression...
but that's just me.

Actually... I firmly believe any decent proxy cache
should always have BOTH compressed and non-compressed
variants hot-to-go for anyone asking for
anything but I'm not going to go there for now.

What I meant before was that there is 'no difference' when
the rubber meets the road and it's time to 'decompress'.

The fellow who asked the original question is
an end-point user who wants to see the transport(ed)
data turned into presentation layer data (correctly).
Caching the transport(ed) stuff is another world.

If I handed you a floppy with a compressed + chunked
data stream capture but with NO HEADER INFO then
there's no way you could ever say for sure if it was
sent with...

Content-encoding: gzip
Transfer-encoding: chunked

or just simply...

Transfer-encoding: gzip, chunked

...and it wouldn't matter a hoot when it comes
to decompressing it. You would still
have all the information you need.

>As far as I understand it, mod_deflate's practice of returning
>Content-encoding: gzip but using Transfer-encoding: gzip semantics (e.g.
>conditionally compressing a resource and using the same ETag for both
>the compressed and non-compressed forms of that resource) is potentially
>poisonous to proxies that handle Range.

You bet.

I could argue with your 'semantics' but I'm probably
already on the high end of the httpd-dev email-too-long
length-o-meter. They don't like long emails around here.

Semantics beside... Apache is NOT using Transfer-Encoding.
Period. There might be bugs with ETag but mod_deflate
is not TRYING to get anywhere near 'Transfer-Encoding'.
See above. It's just using the 'ol 'Content-encoding: gzip'
fakery.

ETag is a problem. No doubt.

There are VERY few chaching servers that come anywhere
near implementing "Vary:" correctly. Most of them just
treat the presence of any "Vary:" header at all as
'poison' all by itself and they will refuse to cache
ANYTHING that "Varies" because they simply don't have
the code onboard to know what the f__k to do with it.

Even servers that have already implemented some form
of "Vary" still won't actually cache 2 different
'fresh' representations of the same URI. They might
replace one with another and get it right but they
still don't know how to store BOTH.

BTW: Did you know that regardless of cache-control
directives most browsers will still keep a representation
of a compressed response hanging around in local
browser cache... which begins to make "Vary:" and
"ETag" look like they don't work? 

The reason... simple... ALL modern browsers are using
their own local caches to actually DECOMPRESS the
data. They create temporary copies of the response
that look like real cache entries and use them as 
work files to do the decompression. Netscape actually
keeps BOTH the original compressed version AND
the decompressed version in its local cache. If you
ask for the URI again then it goes for the decompressed
variant. However... if you ask Netscape to PRINT the
page it goes totally brain-dead and forgets it has two
different representations in its own cache and it tries
to print the original compressed version on the printer.

Whoops!

MSIE is a little better... but not much. It will ERASE
the 'compressed' representation when the ZLIB thread
is done decompressing it... but it still goes brain-dead
regading any cache-control directives and keeps the
decompressed reprsentation sitting in the cache even
if it's not supposed to.

Like I said... it's a jungle out there. All you can do from
Apache perspective is follow the RFC's and wait for
the bug reports and prove that it isn't your fault and
eventually things will get better.

Meter just ran out... I'm sure.

Later...
Kevin



Mime
View raw message