cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Berin Loritsch <blorit...@apache.org>
Subject [RT:Long] Initial Results and comments (was Re: Compiling XML, and its replacement)
Date Fri, 04 Apr 2003 03:56:18 GMT
Stefano Mazzocchi wrote:
> I'll also be interested to see how different the performance gets on 
> hotspot server/client and how much it changes with several subsequent runs.

Well, with HotSpot client and a 15.4 KB (15,798 bytes) test document
(my build.xml file), I got the following results:

      [junit] Parsed 873557 times in 10005ms
      [junit] Average of 0.011453173633775472ms per parse

Compare that to a much smaller 170 bytes (170 bytes) test document:

      [junit] Parsed 16064210 times in 10004ms
      [junit] Average of 6.227508231030347E-4ms per parse


The two documents are at completely different complexities,
but the ratio of results is:

      170b      .000623ms
   --------- = -----------
    15,800b      .0115ms

That's a size increase of 92.9 times

compared to a time increase of 18.5 times


Times were comparable to Server Hotspot for this solution--although it
was only run for 10 seconds.

Considering we have a 5:1 size to time scaling ratio, it would be
interesting to see if it carries out to a much larger XML file--
if only I had one.  If scalability was linear, then a 1,580,000
byte file should only take .23 ms to parse.

I also tried the test with the -Xint (interpreted mode only) option
set, and there was no appreciable difference.  As best I can tell,
this is largely because the code is already as optimized as it
possibly can be.  This fits in line with your observations of unrolled
"loops".

In this instance though, I believe that we are dealing with more than
just "unrolled loops"  We are dealing with file reading overhead, and
interpretation overhead.  Your *compressed* XML addresses the second
issue, but in the end I believe it will behave very similarly to my
solution.

Also keep in mind that improvements in the compiler design (far future)
can allow for repetitive constructs to be moved into a separate method.
For instance, the following XML is highly repetitive:

<demo>
    <entry name="foo">
      bar
    </entry>
    <entry name="foo">
      bar
    </entry>
    <entry name="foo">
      bar
    </entry>
    <entry name="foo">
      bar
    </entry>
    <entry name="foo">
      bar
    </entry>
    <entry name="foo">
      bar
    </entry>
</demo>

As documents become very large it becomes critical to do something
other than my very simplistic compilation.  However there are plenty
of opportunities to optimize the XML compiler.  For example, we could
easily reduce the above XML to something along the lines of:

startElement("demo")

for (int i = 0; i < 6; i++)
{
      outputEntry()
}

endElement("demo")

Even if the attribute values and element values were different,
but the same structure remained, the compiler would be able
to (theorhetically) reduce it to a method with parameters:

startElement("demo")

outputEntry("foo", "bar");
outputEntry("ego", "centric");
outputEntry("gas", "bag");
outputEntry("I", "am");
outputEntry("just", "kidding");
outputEntry("my", "peeps");

endElement("demo")

Still allowing for some level of hotspot action.

However, I believe the true power of Binary XML will be with its
support for XMLCallBacks and (in the mid term future) decorators.
The decorator concept will allow us to set a series of SAX events
for a common object.  This will render the XSLT stage a moot point
as we can apply pre-styled decorators to the same set of objects.
These will call for some alterations of the compiler as it stands
now, and will be required before a 1.0 relese.

I am trying to keep the library lean and mean.



Mime
View raw message