maven-doxia-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukas Theussl <>
Subject entities: text or rawText?
Date Mon, 04 May 2009 09:47:33 GMT


I'm trying to understand some of the issues we have with entities in the 
XmlParser. Is there a special reason why entities are emitted as rawText and not text?

I think they should be emitted as text:

First, custom entities can be used to simply define some replacement text inside 
documents (eg <!ENTITY version "1.0">).

Second, the resulting events should be consumable by all sinks, not just x(ht)ml 
based ones. Consider for instance the text "&amp;&AElig;" (where AElig is defined

as <!ENTITY AElig  "&#198;">). Currently it is emitted by the XhtmlBaseParser as

one text event "&" and one rawText event "&#198;". This means that eg the Latex 
Sink will produce wrong output (the AElig should be converted to "\AE" in latex).

IMO the resolved entity should be emitted in a format-independent way, eg as one 
(unicode?) character, just like &amp; is emitted as one character above. The 
consuming sink then has to transform that into a format-specific representation.


View raw message