cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobia Conforto <>
Subject Re: Parsing HTML entities
Date Fri, 31 Aug 2007 16:02:35 GMT
Never mind, I solved it "by hand"

I wrote a Python script that takes a list of HTML entities and generates
a huge tree of switch() { case: switch () { case: switch () { case: ...

The generated Java code goes through a char[] in a single pass and when
it recognizes an entity it pushes the associated Unicode char into the
SAX stream, instead of the chars composing the entity.

It's pretty brutal, it produces a 36k class file, but it's the fastest
thing that could possibly solve the job, short of writing a C extension!
The pattern transformer took 800ms on some data, where mine takes 2ms!

If anybody is interested, I can post or email the code.

Joerg Heinicke wrote:
> That's one of the rare cases where I consider
> <xsl:text disable-output-escaping="yes"> a valid approach

Yes, that was the first thing I tried, but I discarded it as it was
causing more problems than it solved.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message