forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Turner <>
Subject Re: BROKEN: UTFDataFormatException: String cannot be longer than 32k.
Date Thu, 02 Oct 2003 09:56:29 GMT
On Wed, Oct 01, 2003 at 09:08:12AM -0600, Adam R. B. Jack wrote:
> > I have had this problem with large .txt files being transformed.  IIRC,
> the
> > error actually comes from the XML parser, not forrest or even cocoon.  I
> did
> > some googling a while back and found a patch (I didn't test it) but it
> looked
> > like an ugly hack that just broke the string up into parts and put it back
> > together again.
> >
> > It looks like Jeff has filed this already as a cocoon bug. See
> >
> Thanks for that explanation/information, that makes sense. Any advice
> on if I should "wait patiently" on Jeff's bug entry, or if I should
> look at a hack workaround/fix? [I have no idea if this is something
> simple to fix at a low level, or high priority, or ... ]

It doesn't bite most people (although it bit FOP), and I don't think
anyone is actively investigating it.

> I could chunk data into < 32K pieces, but I'm not sure how I'd fake the
> parser into creating multiple text nodes. [I guess I could be 'ugly' just
> split the file into multiple <source> entries.]

Hmm.  Could you preprocess the XML and add a <!-- --> after every 31k
text chunk in the text?  Very hacky workaround, but easier than fixing
the bug.


> Thanks again.
> regards
> Adam

View raw message