cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Friso van Vollenhoven" <friso.vanvollenho...@mp-objects.com>
Subject parse error on XHTML DTD
Date Fri, 26 May 2006 13:05:53 GMT
Hi all,

For testing purposes I have these two entries in my Cocoon sitemap:
<map:match pattern="TestPdf">
<map:generate src="file:///C:/cvo.html" />
<map:transform src="transform/show/xhtml2fo.xsl" />
<map:serialize type="fo2pdf" mime-type="application/pdf" />
</map:match>

<map:match pattern="TestXhtml">
<map:generate src="file:///C:/cvo.html" />
<map:serialize type="xhtml" />
</map:match>

What I need to do is generate a PDF from some HTML source (which is valid
XHTML 1.0 Strict). Because the FOPGenerator does not accept the XSL-FO
generated by the transformation (xhtml2fo.xsl), I wanted to strip down the
HTML a bit in order to get FOP to actually generate the PDF. Using these two
matchers I should be able to easily test what parts of the HTML cause
FOP-problems (it has to do with nested tables; the HTML is valid, but not
very elegant and tables are heavily used for layout). In the real scenario
the HTML does not come from a disk file, but is generated from an XML source
using a Cocoon pipeline of transformations. I can't go from the XML source
to XSL-FO to PDF directly, because legal issues require that the PDF closely
resembles the document 'as seen by the user'.

What happens is that I get an exception when I browse to either one of the
matchers (TestPdf or TestXhtml):

org.apache.cocoon.ProcessingException
org.apache.cocoon.ProcessingException: Failed to execute pipeline.:
http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent:25:-1:org.xml.sax.SAXParse
Exception: Using original entity definition for "&quot;". at
org.apache.cocoon.components.pipeline.impl.AbstractCachingProcessingPipeline
.processXMLPipeline(AbstractCachingProcessingPipeline.java:283) at
org.apache.cocoon.components.pipeline.AbstractProcessingPipeline.process(Abs
tractProcessingPipeline.java:490) at
org.apache.cocoon.components.treeprocessor.sitemap.SerializeNode.invoke(Seri
alizeNode.java:120) at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:49) at
org.apache.cocoon.components.treeprocessor.sitemap.PreparableMatchNode.invok
e(PreparableMatchNode.java:130) at
org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo
keNodes(AbstractParentProcessingNode.java:72) at
org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(Pipel
ineNode.java:126) at
..... a lot more

Caused by:
org.xml.sax.SAXParseException: Using original entity definition for
"&quot;". at org.apache.crimson.parser.Parser2.warning(Parser2.java:3148) at
org.apache.crimson.parser.Parser2.maybeEntityDecl(Parser2.java:2643) at
org.apache.crimson.parser.Parser2.maybeMarkupDecl(Parser2.java:1210) at
org.apache.crimson.parser.Parser2.externalParameterEntity(Parser2.java:2898)
at org.apache.crimson.parser.Parser2.maybePEReference(Parser2.java:2574) at
org.apache.crimson.parser.Parser2.externalParameterEntity(Parser2.java:2895)
at org.apache.crimson.parser.Parser2.maybeDoctypeDecl(Parser2.java:1167) at
org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:489) at
org.apache.crimson.parser.Parser2.parse(Parser2.java:305) at
org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442) at
org.apache.excalibur.xml.impl.JaxpParser.parse(JaxpParser.java:296) at
org.apache.excalibur.xml.impl.JaxpParser.parse(JaxpParser.java:315) at
org.apache.cocoon.components.source.SourceUtil.parse(SourceUtil.java:252) at
org.apache.cocoon.generation.FileGenerator.generate(FileGenerator.java:116)
...... a lot more again

When I remove the document type definition from the HTML, then the error
does not occur (and I can view the HTML or PDF versions of the page in a
browser). But removing the DOCTYPE declaration is not an option, because I
need (or want) to use entity references such as &nbsp; and the like.

Basically, the HTML file looks like:
<?xml version="1.0" ?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>TITLE</title>
</head>
<body>
<!-- body content, a lot of tables, &nbsp;'s, &Eacute;'s and more -->
</body>
</html>

When I remove the DOCTYPE declaration, all works fine if there are no entity
references in the HTML. If I add a &nbsp; or something and I leave the
DOCTYPE out, I get another exception thrown by the parser (because obviously
the referenced entity could not be found).

I have Crimson 1.1.3 and Cocoon 2.1.5.

Has anyone ever encountered this problem? And, the real Cocoon question: can
I get Cocoon to NOT attempt to validate the XML that comes from the
generator?

Any help on this will be very much appreciated! I am kind of stuck on this
and searching other resources did'nt help me any.



All the best,
Friso van Vollenhoven
MP Objects



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Mime
View raw message