httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: [PATCH] mod_deflate extensions
Date Tue, 19 Nov 2002 14:51:41 GMT

>> Peter J. Cranstone wrote...
>> Since when does web server throughput drop by x% factor using
>> mod_deflate?
> Jeff Trawick wrote...
> I don't think you need me to explain the "why" or the "when" to you.

Think again.

Exactly what scenario are you assuming is supposed to
be so 'obvious' that it doesn't need and explanation/discussion?

There has never been a good discussion and/or presentation
of real data on this topic... just a bunch of 'assumptions'...
and now that compression modules have caching ability whatever
testing HAS been done needs to be done again because perhaps
any/all of the sore spots in anyone's testing can now be
completely eliminated by real-time caching of compressed objects.

All of my experience with compressing Internet Content in
real time on Servers, with or without the caching of the compression
objects, indicates that it USUALLY, if done correctly,
does nothing but INCREASE the 'throughput' of the Server.

Same experience has also shown that if something ends
up being much SLOWER then something is bad WRONG with
the code that's doing it and it is FIXABLE.

The assumption that YOU seem to be clinging to is that once
the Server has bounced through enough APR calls to handle
the transaction with as few things showing up in STRACE
as possible that the Server has done it's job and the
transaction is OVER ( and the CPU somehow magically free
again ).

This is never the case.

Pie is rarely free at a truck stop.

If you dump 100,000 bytes into the I/O subsystem without
taking the (few) milliseconds needed to compress down
to 70-80 percent LESS then SOMETHING in the CPU is still
working MUCH harder than it has to.

The 'data' is not GONE from the box just because the
Server has made some socket calls and gone
about it's business. It still has to be SENT, one
byte at a time, by the same CPU in the same machine.

NIC cards are interrupt driven.

Asking the I/O subsystem to constantly send 70-80 percent
more data than it has to via an interrupt driven
mechanism is basically the most expensive thing you
could ask the CPU to do.

In-memory compression is NOT interrupt driven.

As compared to interrupt driven I/O it is one of the 
LEAST expensive things to ask the CPU to do, on average.

Do not confuse the performance of any given standard distribution
of some legacy compression library called ZLIB with whether
or not, in THEORY, the real-time compression of content
is able to INCREASE the throughput of the Server.

ZLIB was never designed to be used as a 'real-time'
compression engine. The code is VERY OLD and is
still based on a streaming I/O model with heavy
overhead versus direct in-memory compression.

It is a FILE based implementation of LZ77 and
while it performs very well in a batch job against
disk files it still lacks some things which could
qualify it as a high-perfomance real-time
compression engine.

mod_gzip does NOT use 'standard ZLIB' for this
very reason. The performance was not good enough
to produce consistently good throughput.

>> We went through this debate with mod_gzip and it doesn't hold much
>> water. Server boxes are cheap and adding some more ram or even a faster
>> processor is a cheap price to pay when compared to customer satisfaction
>> when their pages load faster.
> Your "Server boxes are cheap" comment is very telling; if I add more
> ram or a faster processor we aren't talking about the same web server.


Regardless of the fact that content compression at the source
origin CAN actually 'improve' the throughput of one single
server ( if done correctly ) let me chime in on this point
and say that if adding a little hardware or perhaps even
another ( dirt cheap these days ) Server box is what it
takes to provide a DRAMATIC improvement in the user
experience then what's the gripe?

If that's what it takes to provide a better experience for the
USER then I agree 100% with Peter. That is what SHOULD be the
focus. Your point of view seems to indicate that you believe
it's better to let your USERS have a 'worse experience' than
they need to just to avoid having to beef up the Server side.

I have always believed that the END USER experience should
be more important than how some single piece of software
'looks' on a benchmark test. Those benchmarks that produce
these holy TPS ratings are usually flawed when it comes
to imitating a REAL user-experience.

It's a classic argument and there have always been
2 camps...

Which is more important...

1. Having a minimal amount of Server to deal with/maintain
and let the users suffer more than they need to.

2. Do whatever it takes to make sure all the technology
that is currently available is being put into play to
provide the best USER experience possible.

I have always pitched my tent in camp # 2 and I think
most people that are serious about hosting Web sites
circle their wagons around the same camp.

> But overall I agree completely that compressing content and adding
> more ram and/or a faster processor as appropriate is the right thing
> to do in many situations.


This was Peter's sole point and is now mine also.

It's the RIGHT thing to do.

None of the fine public domain and/or commercial products
that provide real-time content compression services are
so bad as to render them un-usable so there really isn't
much excuse to NOT use them.

I recommend any/all of them.

Sure... they can all get better... but so can HTTP itself.

Whatever is wrong with mod_deflate can be fixed, filter I/O
and/or compression engine performance included.

Kevin Kiley

View raw message