cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <>
Subject Re: XML Compilation
Date Tue, 17 Oct 2000 15:21:05 GMT
Sylvain Wallez wrote:
> Great, great !
> A few suggestions to make the format more compact without going into
> complicated compression algorithms :
> - using a byte instead of an int for would divide SAX instructions code
> size by 4.

I'm already using a byte :)

 OutputStream.write(int c);

already discards the upper 24 bits (read the javadoc to find out)

> - repetition of elements name and namespace can be avoided for
> "endElement" : XMLInterpreter can hold a stack of open elements to
> retrieve these values.

hmmm, good point, I'll try that and see if it's worth... during
development I found out that not all optimizations end up being such...
for example, increasing the buffer size from 8Kb to 16Kb slows things
down on my system (which is very strange).

> - the first time a string is output, assign it a number (incrementing
> counter from the start of the document) and output other occurences of
> the string as that number. Since XML is highly redundant, this would
> save much, much space. Sure, this increases write time, but will reduce
> read time. But this can lead to issues regarding memory consumption,
> since it requires to keep all previously read strings.

I'm already doing this :)
> Several XML compression tools are also listed on
> but I think we're
> talking here more about a binary format than compression.

No, we are talking about the fastest parsable format you can compile SAX
events in. Speed is my main concern, not size (even if, given the same
speed, I optimized for size).
> BTW, I see several great uses for compiled XML :
> - store pre-parsed XML files on disk (in the repository) to allow fast
> reload whenever they're thrown out of the cache. I'm not sure if it's
> very useful for XSPs since they're already compiled into class files,
> but it will surely be for XSLTs.

Totally, this is the main reason to write such a thing.

I forecast some "compile by xml WAR" tool inside Cocoon2 that will allow
you to precompile all the XSPs and all the static XML documents and
package the whole thing for production, indicating all compilation and
validation problems.

If you think about it, XML and Java are very similar in this concern. 
> - store element-generation code of XSPs into XML bytecode fragments
> using byte[] variables. This will make java files much smaller and help
> avoiding reaching the 64k size limit for methods bodies.

Yep, this is the where the idea came from (I had a post on cocoon-users
about this a while ago talking about this subject).

> - Off topic WRT Cocoon, but IMO worth studying : reduce network load
> between XML-enabled applications that understand this format. Mmmh, this
> can be the first step of a "binary/xml" mime-type ! Once browsers accept
> it, we can choose to send XML or XMLC demending on the http-accept
> header !!

No, that would totally suck and I tell you why:

1) CXML is normally bigger than the original file. Not much, but it
rarely compresses (since only string redundancy is eliminated).

2) textual compressors such as gzip compress XML better than CXML.

3) XML compressors (such as XMill) perform much better than gzip even
for well-formed documents.

CXML focuses on speed then size.

XMill focuses on size then speed.

Also, CXML is highly asymmetrical: it's much faster to interpret than to
compile.... while for normal XML publishing, you need a fast way to
"generate" SAX events but also a fast way to "consume" these events and
serialize them into a stream of chars.

And my CXML format is highly biased toward generation of events rather
than consumption.

Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<>                             Friedrich Nietzsche
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------

View raw message