cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Friso van Vollenhoven" <>
Subject parse error on XHTML DTD
Date Fri, 26 May 2006 13:05:53 GMT
Hi all,

For testing purposes I have these two entries in my Cocoon sitemap:
<map:match pattern="TestPdf">
<map:generate src="file:///C:/cvo.html" />
<map:transform src="transform/show/xhtml2fo.xsl" />
<map:serialize type="fo2pdf" mime-type="application/pdf" />

<map:match pattern="TestXhtml">
<map:generate src="file:///C:/cvo.html" />
<map:serialize type="xhtml" />

What I need to do is generate a PDF from some HTML source (which is valid
XHTML 1.0 Strict). Because the FOPGenerator does not accept the XSL-FO
generated by the transformation (xhtml2fo.xsl), I wanted to strip down the
HTML a bit in order to get FOP to actually generate the PDF. Using these two
matchers I should be able to easily test what parts of the HTML cause
FOP-problems (it has to do with nested tables; the HTML is valid, but not
very elegant and tables are heavily used for layout). In the real scenario
the HTML does not come from a disk file, but is generated from an XML source
using a Cocoon pipeline of transformations. I can't go from the XML source
to XSL-FO to PDF directly, because legal issues require that the PDF closely
resembles the document 'as seen by the user'.

What happens is that I get an exception when I browse to either one of the
matchers (TestPdf or TestXhtml):

org.apache.cocoon.ProcessingException: Failed to execute pipeline.:
Exception: Using original entity definition for "&quot;". at
.processXMLPipeline( at
org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(Abs at
org.apache.cocoon.components.treeprocessor.sitemap.SerializeNode.invoke(Seri at
keNodes( at
e( at
keNodes( at
org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(Pipel at
..... a lot more

Caused by:
org.xml.sax.SAXParseException: Using original entity definition for
"&quot;". at org.apache.crimson.parser.Parser2.warning( at
org.apache.crimson.parser.Parser2.maybeEntityDecl( at
org.apache.crimson.parser.Parser2.maybeMarkupDecl( at
at org.apache.crimson.parser.Parser2.maybePEReference( at
at org.apache.crimson.parser.Parser2.maybeDoctypeDecl( at
org.apache.crimson.parser.Parser2.parseInternal( at
org.apache.crimson.parser.Parser2.parse( at
org.apache.crimson.parser.XMLReaderImpl.parse( at
org.apache.excalibur.xml.impl.JaxpParser.parse( at
org.apache.excalibur.xml.impl.JaxpParser.parse( at
org.apache.cocoon.components.source.SourceUtil.parse( at
...... a lot more again

When I remove the document type definition from the HTML, then the error
does not occur (and I can view the HTML or PDF versions of the page in a
browser). But removing the DOCTYPE declaration is not an option, because I
need (or want) to use entity references such as &nbsp; and the like.

Basically, the HTML file looks like:
<?xml version="1.0" ?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
<!-- body content, a lot of tables, &nbsp;'s, &Eacute;'s and more -->

When I remove the DOCTYPE declaration, all works fine if there are no entity
references in the HTML. If I add a &nbsp; or something and I leave the
DOCTYPE out, I get another exception thrown by the parser (because obviously
the referenced entity could not be found).

I have Crimson 1.1.3 and Cocoon 2.1.5.

Has anyone ever encountered this problem? And, the real Cocoon question: can
I get Cocoon to NOT attempt to validate the XML that comes from the

Any help on this will be very much appreciated! I am kind of stuck on this
and searching other resources did'nt help me any.

All the best,
Friso van Vollenhoven
MP Objects

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message