perl-modperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raymond Wan <>
Subject Re: Best filesystem type for mod_cache in reverse proxy?
Date Wed, 26 Nov 2008 16:14:54 GMT

Hi Michael,

Michael Peters wrote:
> Raymond Wan wrote:
>> I had looked at the effect compression has on web pages a while ago.  
>> Though not relevant to modperl, there is obviously a cost to 
>> compression and since most HTML pages are small, sometimes it is hard 
>> to justify. 
> Not to discredit the work you did researching this, but a lot of 
> people are studying the same thing and coming to different conclusions:
> Yes, backend performance matters, but more and more we realize that 
> the front end tweaks we can make  give a better performance for users.
> Take google as an example. The overhead of compressing their content 
> and decompressing it on the browser takes less time than sending the 
> same content uncompressed over the network. I'd say the same is true 
> for most other applications too.

It's ok; I don't consider another opinion as discrediting my work.  :-)  
Actually, it was a while ago and it was only one aspect of my work and 
in a smaller test bed.  My fault for handwaving in my reply, though.

The point is actually the "sometimes"...  My research was more in 
general compression and web compression was only one aspect.  My point 
is if you take a one byte file and run gzip -9 on it (again, the same 
algorithm as deflate), you get a 24 byte file.  As you increase that 
file size, you will reach a point where it becomes more beneficial to 
compress.  Though my example is both silly and pathological, it just 
shows that there are cases when compression may not be beneficial.  And 
one can imagine the average file size of a web site to be some kind of 
knob and as it turns (average file size increases as you go from site to 
site), the benefits become more and more evident.

For example, compressing an already compressed file is generally 
pointless (if it was done right the first time).  MP3, JPEG, GIF, etc. 
are all file formats that have or may have compression incorporated.  
PDFs can be compressed too if someone selected that option when creating 
it.  English text compresses well (25%, in general?) but two-byte 
encodings such as Chinese and Japanese (I think) get around 40-50% 
[handwaving again :-) there are more updated numbers out there].  Also, 
compression works if it is a uniform file; if a web page has a mix of 
text, images, etc., then each one has to be compressed individually.

As for Google, you are right -- I can imagine why it would work well for 
Google.  However, I can also hypothesize that it might be a special 
case.  I presume you mean the results of a query.  The result we get is 
a list of results which all are related to each other.  i.e., if you 
searched for "apache2 modperl", we can expect those two words to be in 
every result and the type of words to be similar from result to result 
[they would all be computer-oriented].  As compression aims to reduce 
redundancy, their results are perfect for it.  Especially if

Anyway, what I wanted to say is that there ought to be instances when 
compression is beneficial and when it isn't.  I think it is fine to do 
what the Yahoo site says and have it "on" by default; but if someone 
examines the traffic and data and realizes it should be "off", that 
isn't beyond reason.

>> As for dialup, if I remember from those dark modem days :-)
> Even non dialup customers can benefit. Many "broadband" connections 
> aren't very fast, especially in rural places (I'm thinking large 
> portions of the US).
> But all this talk is really useless in the abstract. Take a tool like 
> YSlow for a spin and see how your sites perform with and without 
> compression. Especially looking at the waterfall display.

Well, one good thing about deflate is that it is *fast*.  Very fast.  
So, while my silly one byte file example shows there are exceptions, it 
might be closer to one byte.  :-)

One cost savings might be to pre-compress files since it is more 
time-consuming to compress than decompress using deflate.  i.e., have 
them reside on the server in compressed form.  Of course, that offers 
many problems and is one reason why things like Stacker didn't really 
catch on (much)...


View raw message