cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <>
Subject XML Compilation
Date Tue, 17 Oct 2000 10:34:50 GMT

I have implemented an idea I had in mind for a while: XML bytecode

In short, an XML file is parsed by a regular parser (and possibly
validated against a schema), then compiled into a binary form that is
easier to parse. You can think as the XML equivalent of java bytecode

Then the document is read by an XML interpreter which behaves exactly as
an XML SAX parser (so you can plug it into your code with no changes
whatsoever) just it's much faster since it doesn't have to do any
special parsing but iterating on compiled SAX events and throwing them.

The results are very interesting: speed improvement goes from 16000% for
very small files (100bytes) to 45% for big files (650Kb) over Xerces 1.2
in non-validating SAX mode.

XML compilation is quite fast (300 millis for the 33Kb file, 2500 millis
for the 650Kb file) and doesn't increase the size for much (3% bigger
for the 650Kb file).

See the attached files for a complete result. The tests I used as
attached as well.

The code is written as a test but it's carefully optimized for speed
without any particular JVM trick (only algorithmical optimizations such
as string pooling and faster unicode encoding)

I'm releasing it under the Cocoon APL. If interesting, I will write a
description of the CXML file format and release that as well.

The package is not yet identified, suggestions are welcome.

Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<>                             Friedrich Nietzsche
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------
View raw message