cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Torsten Curdt <>
Subject Re: [RT] SAX stream buffering
Date Wed, 19 Nov 2003 11:23:21 GMT
> I'm not very surprised by these numbers: XMLC does a pretty heavy job to 
> serialize Strings to bytes.
> Furthermore, I just looked at the XMLByteStreamCompiler.write() which 
> shows that it spends most of its time resizing the byte buffer, as 
> resizing is limited to the actual number of bytes needed for the current 
> write, and not by a larger growth increment.
> It would be interesting to redo the test by introducing this growth 
> increment. BTW, I don't understand the "this.buf.length << 1" in the 
> write() method.

Well, thats not exactly true:

buf.length << 1 is a shift operation which is the same
as buf.length*2. The Max() chooses the bigger value.

So that method is fine ;)

>> But the huge difference between the SaxBuffer and the XMLC is that the 
>> XMLC serializes the SAX event on the fly. The SaxBuffer does not 
>> support serialization but keeps the events as objects.
>> IMO spending time on the serialization only makes sense if
>>  a) the memory consumption is too high otherwise
>>  b) the SAX stream is being saved to disk
>> Maybe we can extend the testcases to compare the memory consumption. 
>> For the question of the destination we could let the store decide.
>> Anyway both classes make sense. But maybe they would make even more 
>> sense if they would share the same interface and would become 
>> interchangeable.
>> The SAX stream buffering is a vital component of cocoon. Looking at 
>> the numbers the impact on the performance could be tremendous.
>> What do you think?
> Can't we merge both: use SAXBuffer for in-memory storage, and use 
> XMLC/XMLI to serialize it? This could even be done transparently by 
> having SAXBuffer implementing Serializable and use XMLC/XMLI to 
> implement readObject() and writeObject().

Hm... I don't know if I like that. Although it also came to my mind.

That way we *always* have the memory consumption. It sounds reasonable
from a OOP POV but it might not be a good choice in terms of
scaleability ...I assume :-/

View raw message