forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J.Pietschmann" <j3322...@yahoo.de>
Subject Re: Entities for characters references
Date Fri, 28 Jun 2002 19:04:32 GMT
Well, the rationale I usually hear for using mnemonic
entities are:
- easier to type in than character references
- easier to read than character references
"Special characters" are usually characters which cannot
reasonably expected to be on the keyboard and/or in the
screen font of the machines of the invloved parties.

My rebuttals has been:
- "easier to read": after a transformation, the "special
  characters" will be either encoded as every other
  character or as a character reference -> no gain here
- "easier to type in": an issue, but editors with
  customizable facilities for entering "special characters"
  are abundant by now -> not much value added by menmonic
  entities.

Note that one XML2XML transformation is the V10->V11
transformation.

David Crossley wrote:
> I see special characters being used often.
In Apache xdocs? We are talking about these only.

> However, as you indicate, then we must incur the "overhead"
> of the catalog entity resolver. So yes, this is a solution
> but not a desirable one.

The problem is that catalog support is not necessarily a
given, not part of the XML standard, and implementations
vary in conformance to the OASIS standard or even use
their own proprietary format.

>>- Entities don't mix well with non-DTD validation.
> Not sure what you mean here. I am yet to properly
> experiment with Relax NG.

Entities can only be declared in a DTD. This means you
always have to declare a DTD or an internal subset in
order to get the mnemonic entities resolved, even if
you don't want to validate against the DTD but against
another schema.

> They are there because processing and validation will
> break without them.

This is just a statement. For example, none of the FOP xdocs
use any mnemonic entity, therefore there is nothing to break
there.

>> Has somebody checked the whole lot of the Apache how often
>> the entities are actually used?
> That would be an interesting exercise. Perhaps we could
> try it out with the current set of Cocoon xdocs. Could
> someone devise a stylesheet to detect and summarise them?
<bg> A style sheet? It wont see whether the character got
in as character reference, menmonic or normally encoded.
Does Xerces notify entity usage? Otherwise, a Perl script
might be in order.

 > It is a complex beast. I too would like it simplified.
 > --David
A laudable goal.
Entities get in the way with much of the XML processing
approaches developed in the last years. The value mnemonic
entities originally added has been diminished by UTF-8 and
new tools. It is time to check whether dealing with them
is still worth the trouble.

J.Pietschmann



Mime
View raw message