cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <wallez.anyw...@free.fr>
Subject Re: XML Compilation
Date Wed, 18 Oct 2000 08:55:36 GMT


Stefano Mazzocchi a écrit :
> 
> Sylvain Wallez wrote:
> >
> > Great, great !
> >
> > A few suggestions to make the format more compact without going into
> > complicated compression algorithms :
> >
> > - using a byte instead of an int for would divide SAX instructions code
> > size by 4.
> 
> I'm already using a byte :)
> 
>  OutputStream.write(int c);
> 
> already discards the upper 24 bits (read the javadoc to find out)
> 
Ah! I forgot about that... I always found it strange to use an int
parameter to write a byte (speed concerns ?).

> > - repetition of elements name and namespace can be avoided for
> > "endElement" : XMLInterpreter can hold a stack of open elements to
> > retrieve these values.
> 
> hmmm, good point, I'll try that and see if it's worth... during
> development I found out that not all optimizations end up being such...
> for example, increasing the buffer size from 8Kb to 16Kb slows things
> down on my system (which is very strange).
> 
> > - the first time a string is output, assign it a number (incrementing
> > counter from the start of the document) and output other occurences of
> > the string as that number. Since XML is highly redundant, this would
> > save much, much space. Sure, this increases write time, but will reduce
> > read time. But this can lead to issues regarding memory consumption,
> > since it requires to keep all previously read strings.
> 
> I'm already doing this :)

Sorry, I carefully studied XMLCompiler, but went fast over
CompiledXMLxxxStream.

> 
> > Several XML compression tools are also listed on
> > http://www.oasis-open.org/cover/xmlAndCompression.html but I think we're
> > talking here more about a binary format than compression.
> 
> No, we are talking about the fastest parsable format you can compile SAX
> events in. Speed is my main concern, not size (even if, given the same
> speed, I optimized for size).
> 
> > BTW, I see several great uses for compiled XML :
> >
> > - store pre-parsed XML files on disk (in the repository) to allow fast
> > reload whenever they're thrown out of the cache. I'm not sure if it's
> > very useful for XSPs since they're already compiled into class files,
> > but it will surely be for XSLTs.
> 
> Totally, this is the main reason to write such a thing.
> 
> I forecast some "compile by xml WAR" tool inside Cocoon2 that will allow
> you to precompile all the XSPs and all the static XML documents and
> package the whole thing for production, indicating all compilation and
> validation problems.
> 
> If you think about it, XML and Java are very similar in this concern.
> 

Just as javac stores line number and filename information in class
files, what do you think about optionnaly storing Locator information in
the compiled XML ? This would allow easier debugging when a tranformer
detects an error when processing an XMLC file (not validity error, which
would have been found during initial parsing, but a semantic or
application level error in the data which occurs at run time).

> > - store element-generation code of XSPs into XML bytecode fragments
> > using byte[] variables. This will make java files much smaller and help
> > avoiding reaching the 64k size limit for methods bodies.
> 
> Yep, this is the where the idea came from (I had a post on cocoon-users
> about this a while ago talking about this subject).
> 
> > - Off topic WRT Cocoon, but IMO worth studying : reduce network load
> > between XML-enabled applications that understand this format. Mmmh, this
> > can be the first step of a "binary/xml" mime-type ! Once browsers accept
> > it, we can choose to send XML or XMLC demending on the http-accept
> > header !!
> 
> No, that would totally suck and I tell you why:
> 
> 1) CXML is normally bigger than the original file. Not much, but it
> rarely compresses (since only string redundancy is eliminated).
> 

Mmmh... is it bigger because writing ints to reference element name,
namespace URI, etc is bigger than the average size of the "prefix:name"
string ?

> 2) textual compressors such as gzip compress XML better than CXML.
> 
> 3) XML compressors (such as XMill) perform much better than gzip even
> for well-formed documents.
> 
> CXML focuses on speed then size.
> 
> XMill focuses on size then speed.
> 
> Also, CXML is highly asymmetrical: it's much faster to interpret than to
> compile.... while for normal XML publishing, you need a fast way to
> "generate" SAX events but also a fast way to "consume" these events and
> serialize them into a stream of chars.
> 
> And my CXML format is highly biased toward generation of events rather
> than consumption.
> 

Ok, I understand your point of view. My initial idea was that a simple
encoding scheme (easily interpretable and not CPU intensive), even if
not the most efficient is more likely to be widely adopted. But of
course, it has to have a minimal efficiency ;-)

> --
> Stefano Mazzocchi      One must still have chaos in oneself to be
>                           able to give birth to a dancing star.
> <stefano@apache.org>                             Friedrich Nietzsche
> --------------------------------------------------------------------
>  Missed us in Orlando? Make it up with ApacheCON Europe in London!
> ------------------------- http://ApacheCon.Com ---------------------

-Sylvain

Mime
View raw message