httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Pane <brian.p...@cnet.com>
Subject Re: mod_blank development
Date Sun, 13 Oct 2002 18:33:46 GMT
On Sun, 2002-10-13 at 04:47, fabio rohrich wrote:
> HI!
> I wrote you last time about my development of a new
> apache module.
> 
> mod_blanks: a module for the Apache web server which would on-the-fly 
> remove unnecessary blank space, comments and other non-interesting 
> things from the served page.  Skills needed: the C langugae, a bit of 
> text parsing techniques, HTML, learn Apache API.  Complexity: low to 
> moderate (after learning the API).  Usefulness: moderate to low (but 
> maybe better than that, it's a kind of nice toy topic that could be 
> shown to save a lot of bandwith on the Internet :-).
> 
> So, the question is. I'm developing it for my bachelor thesis
> and my teacher told me it's too easy to develop it.
> So, have you some ideas, like something to do more (something
> like compression) or other things to add in the module.

If you want to stick with the mod_blanks idea but make it
more more advanced (so that it's complicated enough to be
a thesis project), here are a couple of ideas:

  * Removing extra spaces/comments/etc from HTML while delivering
    it is a good idea, but it's not necessarily something that
    you want your web server to do on every request.  If you
    deliver the same page a hundred times per day (or a hundred
    times per second), it's wasteful to keep doing the same
    parsing work on the same file over and over.  So one
    possibility is: make the module smart enough to cache
    the "optimized" versions of pages.

  * Another challenge with mod_blanks is that there is a
    tradeoff between bandwidth cost and hardware cost.  If you
    do a lot of processing to reduce the bytes sent (removing
    extraneous spaces, compression, etc), it will reduce your
    bandwidth cost, but you'll have to spend more on server
    hardware.  And if your server suddenly gets a lot of
    traffic, it might be able to handle the extra load, but
    not if it also has to do all the mod_blanks processing
    (the same idea applies to mod_deflate also).  So one idea
    that might be interesting is:  Let the server administrator
    define which optional filters can be skipped when the server
    is heavily loaded.  (An "optional" module in this situation
    would mean something that we could skip without causing a
    bad response to be sent to the client.  So mod_deflate counts
    as optional, for example, but mod_include doesn't.)  Then,
    during request processing, decide whether to run the
    optional filters based on how overloaded the server is.

  * One more idea: do some research to determine which is
    faster: removing blanks and comments, or just compressing
    the HTML.  Or, to put it another way, build mod_blanks and
    compare its performance to mod_deflate.  Mod_blanks would
    have an advantage, because it can use simpler and faster
    code.  On the other hand, mod_deflate also has an advantage
    because it will result in a smaller block of bytes being
    written to the socket, which usually will reduce the CPU
    time spent in the kernel.  Which one will win?  Or is it
    better to do both: eliminate spaces and comments, and also
    compress?

Brian



Mime
View raw message