cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <>
Subject Re: lowercase XML languages
Date Thu, 14 Mar 2002 23:34:54 GMT
Ross Burton wrote:
> On Wed, 2002-03-13 at 14:38, Robert Koberg wrote:
> > That's a good one... The person who told you that was either pulling
> > your leg, or just trying to make up an explanation. In fact, just the
> > opposite would be true, if anything. Since the upper case letters appear
> > first in ASCII order, you "could" save a little bit of space by trimming
> > off the 7th bit, but unfortunately, not cleanly. Trying to do anything
> > to further compress 7bit ASCII down would essentially result in a "new"
> > character table, which would not be a good thing.
> Wellll... I can see how lower-case tags could give better compression.
> It is more likely that lower-case tag names will appear in the buffer
> (I'm assuming a gzip-like compression here) than upper-case, simply
> because the tag names could appear in the text.
> e.g:
> this is a list of items: <list> blaa </list>
> The list tag names will refer to the 'list' in the character data.  If
> it were <LIST> that compression could not occur.

Exactly. If you study the way the Lempel-Ziv compression algorithms work
(both used inside gzip and zip) it will be evident for you that
lower-case HTML compresses better than uppercase one (mostly because
text inside the HTML will generally be lowercase so their frequency will
be higher).

Anyway, I heard this explaination from Tim Berners-Lee directly, but
don't remember where I read it.

Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<>                             Friedrich Nietzsche

To unsubscribe, e-mail:
For additional commands, email:

View raw message