httpd-docs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael.Schro...@telekurs.de
Subject Antwort: Re: [Review] mod_deflate.xml (Revision)
Date Wed, 20 Nov 2002 15:43:28 GMT

Hi Andy,


> (According to this fact, I'm skipping some of your comments,

okay. (It was just easier for me to edit/respond to the
mail, I should have looked at the online version of course.)

>> Or you could explicitly blame Netscape 4 for doing what it does,
> ..mainly Netscape 4. And we *do* blame it (see example).

Yes, you named it below. I just wondered whether to mention
Netscape 4 explicitly already in this early section, maybe
with a local link to the later target.
(Actually, Netscape 4.06 seems to be the version when they
added the gzip support, possibly together with PNG support;
Netscape 4.03 doesn't send "Accept-Encoding" at all, and I
even remember some customer using 4.04 not supporting gzip.)

>>> you may use the AddOutputFilterByType directive.
>> You might add a note about this being a valid alternative to
>> excluding Netscape 4.
> ?

The point is that Netscape 4.[5-8].* _can_ render gzipped _HTML_
content reasonably well [1], which makes up for the main part
of the traffic when you only look at MIME types that are worth
being compressed (HTML, CSS and JS), i. e. disregarding graphics
and most binary stuff.
If so, and you have only small CSS and JS files, then excluding
_them_ from compression might be a valid alternative to exclude
Netscape 4 completely, especially if you have large and generic
HTML documents (tables and link lists, especially - by the way,
this is exactly my scenario: I rather take the 96% compression
for our large HTML tables and serve CSS/JS uncompressed).
And when you think that way, then including these files on server
side to make them be compressed as well and save HTTP requests
would just be the next logical step: A compressed CSS file being
part of a compressed HTML content is often smaller than a HTTP
response header with the 304 status.

As for the [1], even with HTML stuff Netscape 4 _may_ fail to
treat this one correctly in at least two situations:
a) it may fail to handle "view-source:" correctly (can result
   in a blank screen, as if the file was empty)
b) it may fail to handle printing (would then print the gzipped
   content, which is nothing but a mess).
   You can use "print preview" to check whether printing will
   work, to my experience this will show the gzipped content as
   well. This may save you a lot of paper. ;-)
These situations of course are similar to handling CSS and JS
from separate files: In all of these cases Netscape would have
to use the uncompressed content while in its cache only the
compressed content is stored. They somehow missed to think about
this.
Unfortunately, all of this isn't quite reproducable. As to my
experience, it mostly fails but sometimes works. I also believe
to have detected some correlation between having a cache in use
in Netscape 4 and not doing so, but I cannot deterministically
create either of both situations.
So, if print support is required, then completely excluding Net-
scape 4 _may_ be unavoidable ... or offering a "print version"
for the most important pages via some navigation concept that
would be served in uncompressed form, by making the compression
module decline to gzip URLs that match a certain regular expres-
sion (like "nogzip_" or whatever - I am using this in our pro-
duction environment).
We have different compression configurations for different Net-
scape 4 customers (i. e. Virtual Hosts) because of all this.

>> are mostly dynamic anyway and this did even save lots of HTTP
>> headers, which aren't compressed after all).
> btw: you know <http://httpd.apache.org/docs-2.0/mod/mod_expires.html>?
;-)

Sure, I am using (the 1.3 version of) it excessively.
But you know how users configure their browsers? :-(

Netscape 4 did trust the "Expires" headers without hesitation,
even if you set its caching strategy to "validate always". ;-)
The modern browsers overrule this, which may lead to a flood
of HTTP 304 responses if they have "validate always" selected.

>> And did you ever try to send compressed PDF to a MSIE5 Acrobat
>> plugin? Now this depends ... ;-(
> ehm, no. I don't use the MSIE for normal browsing ;-)
> What happens?

It may work, and it may fail (with some nasty-looking internal
error message box). Looks like a race condition to me, between
unzipping and forwarding the content to the plugin (which might
then easily concern any other plugins as well).
I tried the same with Mozilla, where it worked perfectly ...

Another possible problem is using gzipped JavaScript code from
a separate file in an "onLoad" event of the HTML file ... again,
this might cause a race condition because some MSIE are said
to already fire the "onLoad" when the JavaScript content is
_received_, not waiting until it is finally unzipped ... sigh.
(Which may then be another reason for including JavaScript
server-sided if you want to serve the content in gzipped form.)

> However, it's easy to exclude pdf files... should we include that
> in the example?

If wish I could tell in detail ... if I had an exact explana-
tion or reproduction of the effect, I would be +1 for adding
it here. So, I just wanted to mention the case at all, to
encourage further experiments.

> hmm, what can a server admin actually do?
> (possible variant:
> BrowserMatch \bMSIE MSIE
> SetEnvIf http_version HTTP/1.0 waah
> ...
> <!--#if expr="$MSIE && $waah" -->
> Enable HTTP/1.1 for compression
> <!--#endif -->
> )

Unfortunately this won't even work.

The server can detect the HTTP level of the incoming request.
But it cannot detect the HTTP level of the request being sent
by the browser in the first place! There might be proxies in
between forwarding a HTTP/1.1 request to a HTTP/1.0 request.
I am running a transparent Netscape proxy doing just that. :-(

But "MSIE understanding gzip" is bundeled together with "MSIE
_trying_ to speak HTTP/1.1" - not with actually _doing_ it ...
the only thing you can detect is whether the "Accept-Encoding:"
header is there or not.
And then, you can detect whether its content is something like
"XXXXXXXXXXXX" ... you won't ever imagine the strange ideas that
these cheapo "personal firewalls" come up with: "Hey, we aren't
smart enough to support gzip but still want to parse content -
so why not just keep the browsers from requesting compressed
pages?". Outch. (http://www.schroepl.net/projekte/mod_gzip/firewalls.htm)

> If someone writes a compression howto, that would be appropriate.
> Hmm... seriously: Are you interested in writing such a document?

Is there any requirement list what such a paper would include,
and from which point of view the matter had to be described?
Someone please collect the questions, maybe write some table
of contents, and I will be happy to add to the chapters what-
ever I think I know about. (Maybe this mail can be something
to start with ...)

As for the MSIE / HTTP level, there already is some document:
     http://www.schroepl.net/projekte/msie/
Visit this page with MSIE having HTTP/1.1 turned off in your
Internet Options, all other ways will be less informative.
But even in this case I am not perfectly sure about the diffe-
rences between MSIE 5 and 6 (although this document will show
different content when being viewed with these two), as they
seem to depend on the Windows version as well: My Win98 PC at
home doesn't behave the same as my WinNT4 PC in office, both

running MSIE 5.5 ... :-( My experience in this area is based
on no more than a dozen different PCs.

>> Or maybe just a note how to check whether this header actually
>> arrived from the browser ... I remember the log definitions
>> provide for HTTP request headers being accessible like
>> "%{Accept-Encoding}i", at least this worked for Apache 1.3. ;-)
> That's documented in mod_log_config.xml.

What about adding some link from here to there?

>> Actually, using the UserAgent as a negotiation parameter is a
>> thing that I discourage doing (in the mod_gzip docs) because of
>> this effect. You are faced with the trade-off whom to let suffer:
>> Either serve the Netscape 4 users broken content or serve every-
>> one correct but slow content.
> +1 for slow ;-)
> We cannot recommed to serve broken content. (resp. *I* cannot...)

There are a lot of situations where you have to decide whether
to support broken foreign software or not. In this special
case it would have to be done at the expense of the vast
majority of the other browsers who correctly support the
standard. There might be more than one opinion about that.

>> Squid 2.0 to 2.4 will take _any_ "Vary:" header to simply turn
>> off caching for this response - this is correct but quite a pity
>> in case of performance. As long as there are not many Squid 2.5
>> around, the main requirement is sending any "Vary:" at all; the
>> more widely used Squid 2.5 will be, the more important the
>> "Vary:" content will become.
> Hmm, a Vary header that doesn't reflect the parameters, that *vary*,
> is worthless, isn't it?

Again, this depends upon how well HTTP/1.1 is handled by
some partner software.
For Squid 2.0-2.4, actually "Vary: *" would have the same
effect as "Vary: Accept-Encoding,UserAgent" or whatever.
Only from Squid 2.5 on this will be different now, and thus
require the most accurate calculation of "Vary:" possible.

And no, a "Vary:" with only "Accept-Encoding" is far from
being worthless ... it would protect a lot of browsers (like
the MSIE running in HTTP/1.0 mode) from gzipped content,
while in fact not protecting _all_ of them (like Netscape 4).
Every bit can help and prevent potential problems.
If you have a broken proxy in the game that ignores "Vary:"
headers (like a customer of ours with some Microsoft Proxy
running with installation defaults), you are lost anyway.

By the way: Is a compressed version of a content still con-
sidered to be the same "HTTP entity"?
I am asking because I am not sure whether gzipping the content
should make Apache create some different "ETag:" as well ...
the RFC 2616 isn't very informative to me about this aspect,
and the Squid developers suggested me to consider doing this
(maybe just add some extension to the existing ETag, to make
it unique again - would that rather help or hurt?).

>> But as there are other, broken proxies out there, causing a lot
>> of trouble when dealing with gzipped content, you might hint at
>> the "Via:" header to be another potential negotiation parameter
>> that is worth being taken into consideration, once you identified
>> the "broken guy". (Does Apache provide an easy means for that?)
> ehm... SetEnvIf?
> But I guess, it's similar to the User-Agent stuff (much to match...)

Agreed. I wouldn't add it to the example, but maybe add the
idea how (and more importantly, why!) to do it in the text.
(Maybe rather in the "Compression Howto" mentioned elsewhere.)

>> If so, the gains of using more than 6 seem to be _very_ small
>> (below the 1% rate in many cases), thus you might suggest to use
>> something in the 3 (reasonable minimum) to 6 range for installa-
>> tions that would like to save CPU power - they won't lose a lot
>> of the effect by doing so.
> is there some more mathematical stuff somewhere? (or statistics?)
> Actually I don't know enough about the compression algorithm to
> verify that...

Sorry, not me. But the general idea of gzip is to let the
user decide the trade-off between CPU power and compres-
sion effect - which leads me to the conclusion that
a) if I have to compress some file any number of times I
   might rather take the "cheap 99% version" whereas
b) if I have to compress it only once and store that any-
   where I am much more likely wanting to have the 100%
   solution at extra expenses, as it might pay out in the
   long run.
Looks reasonable to me even without mathematical formulae.
Therefore I would rather not recommend using level 9 as
default, at least not without explaining the alternatives.
(If the server idles most of the time, better invest its
power into gzipping, but if you run applications there ...)

>> If you had the ability to cache the compressed content (like
>> gzip_cnc does, or when combining an Apache and some front-end
>> proxy Squid 2.5), _then_ 9 would be the perfect choice, but not
>> if each and every content has to be compressed again. Run bench-
>> marks on this, it may turn out quite expensive for high-traffic
>> servers.
> hmm. I someone would ask me, I would *always* recommend to use
> statically compressed content.

I am with you ... this is why gzip_cnc exists at all, and
why CK1 added the optional automatic update of statically
precompressed files to mod_gzip 1.3.26.1a.: They both keep
the less experienced "page maintainer" from thinking about
updating their precompressed versions all the time when
changing the original file's content. Just turn it on and
forget about it - this is the fool-proof solution for those
who don't even have shell access to their webspace.
Apache can do the negotiation part on its own, even without
mod_deflate ... but this isn't "the real thing" yet, IMHO.

> compressed CGI output (like a webforum) should be tested on
> compression vs. performance. But I guess most CGI programs or PHP
> scripts actually waste more time and load than the compression ever
> would take.

Part of the "consideration" process is done in mod_gzip by
configuring lower and upper bounds for the size of content
to be compressed: With very small files, you won't gain a
lot in absolute terms, and with very large files, you may
cause load peaks and delays for serving then compressed
content.
But then, both may still be quite reasonable having the CPU
power available: Compressing very large files may produce
very large gains, and even a 400 bytes HTML file can often
be compressed by about 50%, thus reducing the total HTTP
packet size by about 25%.

So I think this is up to the admin to decide ... these might
be two directives that would fit nicely into mod_deflate.
(And would then most likely require to be in the .htaccess
context, not only in server config and virtual host; maybe
depend on some AllowOverride class as well to easily limit
its availability to certain "service levels". Making compres-
sion practically usable for mass hosters is a goal that is
not yet achieved by mod_gzip either, although these compa-
nies might well be the ones that would profit most from it.)

>> Viele Grüße
> ;-)

I got used to this from some other mailing list where I have
to respond in English to people whom I feel can read/write
German much more fluently - this might encourage them to mail
me in German in case they have problems describing their
case in English ... somehow similar to your signature.


Regards, Michael



---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Mime
View raw message