cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Torsten Curdt <>
Subject Re: DO NOT REPLY [Bug 23299] - [PATCH] UTFDataFormatException: String cannot be longer than 32k.
Date Wed, 12 Nov 2003 10:37:07 GMT
>>We could also increasing the max length of a stored
>>character event in general. ...but that would waste
>>2 bytes per event. Hm...
>>What do you think?
> Hi,
> why should we handle the UTFDataFormat exception, at all?. The last solution ignores
this exception, doesn't it?
> Where is the difference between 
> event 
> string 32k
> string 4k
> and
> event
> string 36k 
> in the bytestream?
> The questions is if we need the UTFDataFormatException or not. If not a patch can simply
remove the statement if(string>32k){} and then we get the result:
> event
> string xxk (the limit is than the java integer-range)

Well, that true ...but the current length is hold
as 15-bit integer. The highest bit decides whether
it's an index in a HashMap or not.

As I said we could increase the length to 31-bit
but that gives 2 additional bytes per character

Stefano, did I explain this right?!

> Maybe I'm totally wrong, but i think the string 32k limitation comes from the CXML-format
from   Stefano Mazzocchi

Yes, that's correct

> I understand it in this way, that the cxml-format is independent from cocoon and java,
so if anyone writes a decoder in Language C he can use that bytestream, too.


> The Sax-Events should not be the problem, every SaxHandler has to process the following
> <node>
>   text here 
>   <!-- comment here -->
>   text here
> </node>
> this gives a Character-Event,Comment-Event,Character-Event for one node, or do i misunterstand
the SAX-processing totally?

That's correct if the text nodes are not affected by the patch.
If there is a character event larger than 32k the events come
out differently.

> If it's correct, a Character-Event,Character-Event,... should not be a problem.

   text here
   <!-- comment here -->
   long text here

could come out like

   "text here"
   <!-- comment here -->
   "long text""here"

and relies on the transformer to normalize the text nodes!!

Let's not discuss this to death. I'll fix it :)


View raw message