Well, the rationale I usually hear for using mnemonic
entities are:
- easier to type in than character references
- easier to read than character references
"Special characters" are usually characters which cannot
reasonably expected to be on the keyboard and/or in the
screen font of the machines of the invloved parties.
My rebuttals has been:
- "easier to read": after a transformation, the "special
characters" will be either encoded as every other
character or as a character reference -> no gain here
- "easier to type in": an issue, but editors with
customizable facilities for entering "special characters"
are abundant by now -> not much value added by menmonic
entities.
Note that one XML2XML transformation is the V10->V11
transformation.
David Crossley wrote:
> I see special characters being used often.
In Apache xdocs? We are talking about these only.
> However, as you indicate, then we must incur the "overhead"
> of the catalog entity resolver. So yes, this is a solution
> but not a desirable one.
The problem is that catalog support is not necessarily a
given, not part of the XML standard, and implementations
vary in conformance to the OASIS standard or even use
their own proprietary format.
>>- Entities don't mix well with non-DTD validation.
> Not sure what you mean here. I am yet to properly
> experiment with Relax NG.
Entities can only be declared in a DTD. This means you
always have to declare a DTD or an internal subset in
order to get the mnemonic entities resolved, even if
you don't want to validate against the DTD but against
another schema.
> They are there because processing and validation will
> break without them.
This is just a statement. For example, none of the FOP xdocs
use any mnemonic entity, therefore there is nothing to break
there.
>> Has somebody checked the whole lot of the Apache how often
>> the entities are actually used?
> That would be an interesting exercise. Perhaps we could
> try it out with the current set of Cocoon xdocs. Could
> someone devise a stylesheet to detect and summarise them?
<bg> A style sheet? It wont see whether the character got
in as character reference, menmonic or normally encoded.
Does Xerces notify entity usage? Otherwise, a Perl script
might be in order.
> It is a complex beast. I too would like it simplified.
> --David
A laudable goal.
Entities get in the way with much of the XML processing
approaches developed in the last years. The value mnemonic
entities originally added has been diminished by UTF-8 and
new tools. It is time to check whether dealing with them
is still worth the trouble.
J.Pietschmann
|