xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Armin Pfarr" <apf...@netsurf.de>
Subject Character Entities revisited
Date Wed, 17 Nov 1999 10:33:33 GMT
Hello,

There seems to be a bug in Cocoon 1.5, that isn't covered by the FAQ.
External entities don't get included, even if referenced by absolute URL's.
Is there a possible workaround?

To give you a more precise view of the question:

I have a XML-DTD, that looks like
"
<!--
DOCTYPE memo
        Typical invocation:
        <!DOCTYPE memo PUBLIC '-//Armin Pfarr//DTD Memo V0.9//EN'>
rpose:
-->

<!-- Declaration of parameter entities -->
	<!ENTITY % cm.simple-para
	        'p | ul | ol | dl'>
	... rest omitted
<!-- End of declaration of parameter entities -->


<!-- The CALS table-model is to be used -->
	<!ENTITY % cals-table
      	  SYSTEM "http://localhost/dtd/Entities/calstbl.pen">
	<!ENTITY % tblelm
      	   "table">
	<!ENTITY % tblmdl
      	   "((tgroup+), title?, caption?)">
	<!ENTITY % tblexpt
      	   "">
	<!ENTITY % tblcon
      	   "%cm.table-cell-content; ">
	<!ENTITY % bodyatt
      	   "      id ID #IMPLIED ">

<!-- Declaration of character entities -->
	<!ENTITY % iso-latin1
      	  SYSTEM "http://localhost/dtd/Entities/ISOlat1.pen">
	%iso-latin1;
	<!ENTITY % iso-num
        	SYSTEM "http://localhost/dtd/Entities/ISOnum.pen">
	%iso-num;
	... rest omitted
<!-- End of declaration of character entities -->

<!-- Declaration of document structure -->
	<!ELEMENT	memo
		(from,to,cc?,body) >
	... rest omitted
<!-- End of declaration of document structure -->
"

The referenced entities are defined in files like "ISOLat1.pen", that look
like

"
    <!ENTITY Agrave  "&#192;" ><!-- (#x00C0) capital A, grave accent -->
    <!ENTITY Aacute  "&#193;" ><!-- (#x00C1) capital A, acute accent -->
    <!ENTITY Acirc   "&#194;" ><!-- (#x00C2) capital A, circumflex
accent -->
    <!ENTITY Atilde  "&#195;" ><!-- (#x00C3) capital A, tilde -->
    <!ENTITY Auml    "&#196;" ><!-- (#x00C4) capital A, dieresis or umlaut
mark -->
    <!ENTITY Aring   "&#197;" ><!-- (#x00C5) capital A, ring -->
    <!ENTITY AElig   "&#198;" ><!-- (#x00C6) capital AE diphthong
(ligature) -->
    <!ENTITY Ccedil  "&#199;" ><!-- (#x00C7) capital C, cedilla -->
    <!ENTITY Egrave  "&#200;" ><!-- (#x00C8) capital E, grave accent -->
    <!ENTITY Eacute  "&#201;" ><!-- (#x00C9) capital E, acute accent -->
    <!ENTITY Ecirc   "&#202;" ><!-- (#x00CA) capital E, circumflex
accent -->
    <!ENTITY Euml    "&#203;" ><!-- (#x00CB) capital E, dieresis or umlaut
mark -->
    <!ENTITY Igrave  "&#204;" ><!-- (#x00CC) capital I, grave accent -->
    <!ENTITY Iacute  "&#205;" ><!-- (#x00CD) capital I, acute accent -->
    <!ENTITY Icirc   "&#206;" ><!-- (#x00CE) capital I, circumflex
accent -->
    ... rest omitted
"
The entity-files are acessible via absolute URL's, for example
"http://localhost/dtd/Entities/ISOLat1.pen"

My XSL-file looks like
"
<xsl:stylesheet version="1.0"
		xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
		xmlns:fo="http://www.w3.org/XSL/Format/1.0">

	<xsl:template match="memo">
	<!--
	Left out to produce only a dynamic XML-instance
	<xsl:processing-instruction
name="cocoon-format">type="text/xslfo"</xsl:processing-instruction>
	-->
	<fo:root xmlns:fo="http://www.w3.org/XSL/Format/1.0">
      <fo:layout-master-set>
      <fo:simple-page-master
        page-master-name="right"
        margin-top="75pt"
        margin-bottom="25pt"
        margin-left="100pt"
        margin-right="50pt">
        <fo:region-body margin-bottom="50pt"/>
        <fo:region-after extent="25pt"/>
      </fo:simple-page-master>
      <fo:simple-page-master
        page-master-name="left"
        margin-top="75pt"
        margin-bottom="25pt"
        margin-left="50pt"
        margin-right="100pt">
        <fo:region-body margin-bottom="50pt"/>
        <fo:region-after extent="25pt"/>
      </fo:simple-page-master>
      </fo:layout-master-set>

      <fo:page-sequence>

        <fo:sequence-specification>
          <fo:sequence-specifier-alternating
            page-master-first="right"
            page-master-odd="right"
            page-master-even="left"/>
        </fo:sequence-specification>

        <fo:static-content flow-name="xsl-after">
          <fo:block text-align-last="centered"
font-size="10pt"><fo:page-number/></fo:block>
        </fo:static-content>

        <fo:flow>
          <xsl:apply-templates/>
        </fo:flow>
      </fo:page-sequence>

    </fo:root>
  </xsl:template>

  <xsl:template match="lesson/title">
    <fo:block font-size="18pt" text-align-last="centered"
space-before.optimum="36pt" >
      <xsl:apply-templates/>
    </fo:block>
  </xsl:template>
  ...rest omitted
"

My XML-Instance looks like
"
<?xml version="1.0" encoding="UTF-8"	?>
<?xml-stylesheet href="memo.xsl" type="text/xsl"?>
<?cocoon-process type="xslt"?>
<!DOCTYPE memo SYSTEM "http://localhost/dtd/doctypes/memo_0_9/memo.dtd" >
<memo>
	<from>arp</from>
	<to>ths</to>
	<cc>uwe</cc>
	<body><p>Hello H&auml;gar</p></body>
</memo>
"

When processing the file with Lotus XSL (18.5) from the command-line, the
entities are processed and are substituted with their Unicode-Counterparts
in the result-instance. When doing the same with Cocoon (using the same
processor - LotusXSL 18.5 - and the same Parser - IBM xml4j) the resulting
document doesn't contain entities or Unicode-References, but instead, the
entities have vanished.

The idea of including the Entity-references in the DTD cannot be the
solution of choice: ISOLat1, CALS and other entity sets are maintained by
external institutions and therefore should always be referenced and not
included (preferably by PUBLIC-identifiers only, but somebody at W3C missed
that idea during the development of XML).

Armin Pfarr


Mime
View raw message