cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From simon.mi...@t-online.de (Simon Mieth)
Subject Re: DO NOT REPLY [Bug 23299] - [PATCH] UTFDataFormatException: String cannot be longer than 32k.
Date Wed, 12 Nov 2003 08:44:17 GMT
On Sat, 08 Nov 2003 18:22:08 +0100
Torsten Curdt <tcurdt@vafer.org> wrote:

> >> ...I was wondering - is this a bug of the component that produces the
> >> SAX events or the XMLByteStreamCompiler? I mean: now it's ok - but 
> >> should we
> >> silently ignore the problem?
> > 
> > 
> > Torsten, I don't understand your concerns. Isn't the fix simply about 
> > handling text nodes longer than 32 k? Ok, they shouldn't occur that 
> > often (it's half a novel :-) ), but it's possible.
> 
> ...we duplicate events here and the thereby modify the SAX stream.
> Should be no problem.... but who knows ;)
> 
> with the patch:
> 
>   characters(36k)
> ->
>   event
>   string 32k
>   event
>   string 4k
> 
> I guess it would be better to have it like this:
> 
>   characters(36k)
> ->
>   event
>   string 32k
>   string 4k
> 
> So what goes in comes out the same way.
> 
> We could also increasing the max length of a stored
> character event in general. ...but that would waste
> 2 bytes per event. Hm...
> 
> What do you think?
> --
> Torsten
> 
Hi,

why should we handle the UTFDataFormat exception, at all?. The last solution ignores this
exception, doesn't it?
Where is the difference between 

event 
string 32k
string 4k

and
event
string 36k 

in the bytestream?

The questions is if we need the UTFDataFormatException or not. If not a patch can simply remove
the statement if(string>32k){} and then we get the result:

event
string xxk (the limit is than the java integer-range)

Maybe I'm totally wrong, but i think the string 32k limitation comes from the CXML-format
from   Stefano Mazzocchi
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=97194999124269&w=2

I understand it in this way, that the cxml-format is independent from cocoon and java, so
if anyone writes a decoder in Language C he can use that bytestream, too.

The Sax-Events should not be the problem, every SaxHandler has to process the following correct

<node>
  text here 
  <!-- comment here -->
  text here
</node>

this gives a Character-Event,Comment-Event,Character-Event for one node, or do i misunterstand
the SAX-processing totally?

If it's correct, a Character-Event,Character-Event,... should not be a problem.

Of course, the patch handles the splitting not efficiently and it may be better to write it
in a while-loop in an extra splitBigStrings-method.



Sorry if i'm wrong,

regards Simon




Mime
View raw message