cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <>
Subject Re: XML Compilation
Date Tue, 17 Oct 2000 12:45:26 GMT
Great, great !

A few suggestions to make the format more compact without going into
complicated compression algorithms :

- using a byte instead of an int for would divide SAX instructions code
size by 4.
- repetition of elements name and namespace can be avoided for
"endElement" : XMLInterpreter can hold a stack of open elements to
retrieve these values.
- the first time a string is output, assign it a number (incrementing
counter from the start of the document) and output other occurences of
the string as that number. Since XML is highly redundant, this would
save much, much space. Sure, this increases write time, but will reduce
read time. But this can lead to issues regarding memory consumption,
since it requires to keep all previously read strings.

Several XML compression tools are also listed on but I think we're
talking here more about a binary format than compression.

BTW, I see several great uses for compiled XML :

- store pre-parsed XML files on disk (in the repository) to allow fast
reload whenever they're thrown out of the cache. I'm not sure if it's
very useful for XSPs since they're already compiled into class files,
but it will surely be for XSLTs.

- store element-generation code of XSPs into XML bytecode fragments
using byte[] variables. This will make java files much smaller and help
avoiding reaching the 64k size limit for methods bodies.

- Off topic WRT Cocoon, but IMO worth studying : reduce network load
between XML-enabled applications that understand this format. Mmmh, this
can be the first step of a "binary/xml" mime-type ! Once browsers accept
it, we can choose to send XML or XMLC demending on the http-accept
header !!

My 0.02 euro.


Stefano Mazzocchi a écrit :
> Hi,
> I have implemented an idea I had in mind for a while: XML bytecode
> compilation.
> In short, an XML file is parsed by a regular parser (and possibly
> validated against a schema), then compiled into a binary form that is
> easier to parse. You can think as the XML equivalent of java bytecode
> compilation.
> Then the document is read by an XML interpreter which behaves exactly as
> an XML SAX parser (so you can plug it into your code with no changes
> whatsoever) just it's much faster since it doesn't have to do any
> special parsing but iterating on compiled SAX events and throwing them.
> The results are very interesting: speed improvement goes from 16000% for
> very small files (100bytes) to 45% for big files (650Kb) over Xerces 1.2
> in non-validating SAX mode.
> XML compilation is quite fast (300 millis for the 33Kb file, 2500 millis
> for the 650Kb file) and doesn't increase the size for much (3% bigger
> for the 650Kb file).
> See the attached files for a complete result. The tests I used as
> attached as well.
> The code is written as a test but it's carefully optimized for speed
> without any particular JVM trick (only algorithmical optimizations such
> as string pooling and faster unicode encoding)
> I'm releasing it under the Cocoon APL. If interesting, I will write a
> description of the CXML file format and release that as well.
> The package is not yet identified, suggestions are welcome.
> --
> Stefano Mazzocchi      One must still have chaos in oneself to be
>                           able to give birth to a dancing star.
> <>                             Friedrich Nietzsche
> --------------------------------------------------------------------
>  Missed us in Orlando? Make it up with ApacheCON Europe in London!
> ------------------------- http://ApacheCon.Com ---------------------
>   ------------------------------------------------------------------------
>                          Name:
>    Type: Zip Compressed Data (application/x-zip-compressed)
>                      Encoding: base64

View raw message