cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <sylv...@apache.org>
Subject Re: [RT] SAX stream buffering
Date Wed, 19 Nov 2003 09:26:28 GMT
Torsten Curdt wrote:

> Hi, folks!
>
> The numbers of the XMLByteStreamCompilerInterpreterTestCase and the 
> SaxBufferTestCase gave me some RT
> -- 
> If you have a look at the testcases it's quite obvious that the 
> SaxBuffer is *much* faster than the XMLByteStream classes. As a thumb 
> rule -just to get the dimensions- we could say:
>
>  XMLC/XMLI is about 15 times faster than Xerces SaxBuffer is about 100 
> times faster than Xerces
>
> Of course this depends heavily on the document. But it should be 
> enough to grasp the magnitude. Which was a bit of a surprise for me. I 
> personally did not expect this *huge* difference. Especially because 
> the SaxBuffer creates much more objects than the XMLC.


I'm not very surprised by these numbers: XMLC does a pretty heavy job to 
serialize Strings to bytes.

Furthermore, I just looked at the XMLByteStreamCompiler.write() which 
shows that it spends most of its time resizing the byte buffer, as 
resizing is limited to the actual number of bytes needed for the current 
write, and not by a larger growth increment.

It would be interesting to redo the test by introducing this growth 
increment. BTW, I don't understand the "this.buf.length << 1" in the 
write() method.

> But the huge difference between the SaxBuffer and the XMLC is that the 
> XMLC serializes the SAX event on the fly. The SaxBuffer does not 
> support serialization but keeps the events as objects.
>
> IMO spending time on the serialization only makes sense if
>
>  a) the memory consumption is too high otherwise
>  b) the SAX stream is being saved to disk
>
> Maybe we can extend the testcases to compare the memory consumption. 
> For the question of the destination we could let the store decide.
>
> Anyway both classes make sense. But maybe they would make even more 
> sense if they would share the same interface and would become 
> interchangeable.
>
> The SAX stream buffering is a vital component of cocoon. Looking at 
> the numbers the impact on the performance could be tremendous.
>
> What do you think?


Can't we merge both: use SAXBuffer for in-memory storage, and use 
XMLC/XMLI to serialize it? This could even be done transparently by 
having SAXBuffer implementing Serializable and use XMLC/XMLI to 
implement readObject() and writeObject().

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com



Mime
View raw message