cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anders Conrad">
Subject Parsing problem
Date Thu, 28 Jun 2001 10:10:04 GMT

After sucessfully installing Cocoon 1.8.2, I encounter parsing errors when trying to transform
xml documents into html using the cocoon xslt processor. The documents are encoded in UTF-8,
and the problem is caused by element names containing the Danish letters æ, ø  and å (ae,
oe, aa) in the UTF-8 encoding. The same letters in regular text are parsed correctly, though,
the problem only occurs in element names. The documents in question have been parsed without
problems in XMLspy and James Clerk's SP.

Has anyone else encountered this problem? Of course, an obvious solution is to avoid non-english
characters in element names, but this may require large amounts of filtering of existing texts,
changing of DTD's etc. To my best knowledge, non-English characters should be allowed in XML

The platform is: Suse Linux 7.1, Apache 1.3.14, Tomcat 3.2.2, JDK 1.1.8.

The error stack is the following:
org.xml.sax.SAXException: A ')' is required in the declaration of element type "simpledoc".
[FATAL ERROR] [File: "file:/var/jakarta-tomcat-3.2.2/webapps/cocoon/diplo/charbug.dtd" Line:
3 Column: 24] (nested exception: org.xml.sax.SAXParseException: A ')' is required in the declaration
of element type "simpledoc". )
	at org.apache.cocoon.parser.AbstractParser.fatalError(
	at org.apache.xerces.framework.XMLParser.reportError(
	at org.apache.xerces.framework.XMLDTDScanner.reportFatalXMLError(
	at org.apache.xerces.framework.XMLDTDScanner.scanChildren(
	at org.apache.xerces.framework.XMLDTDScanner.scanElementDecl(
	at org.apache.xerces.framework.XMLDTDScanner.scanDecls(
	at org.apache.xerces.framework.XMLDocumentScanner.scanDoctypeDecl(
	at org.apache.xerces.framework.XMLDocumentScanner.access$0(
	at org.apache.xerces.framework.XMLDocumentScanner$PrologDispatcher.dispatch(
	at org.apache.xerces.framework.XMLDocumentScanner.parseSome(
	at org.apache.xerces.framework.XMLParser.parse(
	at org.apache.cocoon.parser.XercesParser.parse(
	at org.apache.cocoon.parser.AbstractParser.parse(
	at org.apache.cocoon.producer.ProducerFromFile.getDocument(
	at org.apache.cocoon.Engine.handle(
	at org.apache.cocoon.Cocoon.service(
	at javax.servlet.http.HttpServlet.service(
	at org.apache.tomcat.core.ServletWrapper.doService(
	at org.apache.tomcat.core.Handler.service(
	at org.apache.tomcat.core.ServletWrapper.service(
	at org.apache.tomcat.core.ContextManager.internalService(
	at org.apache.tomcat.core.ContextManager.service(
	at org.apache.tomcat.service.connector.Ajp13ConnectionHandler.processConnection(
	at org.apache.tomcat.service.TcpWorkerThread.runIt(
	at org.apache.tomcat.util.ThreadPool$

The DTD in question looks like this:

<?xml version="1.0" encoding="UTF-8"?>

<!-- edited with XML Spy v3.5 NT ( by Anders Conrad (DSL) -->

<!ELEMENT simpledoc (række)>

<!ELEMENT række (#PCDATA)>

and the fix would be a change to the following (with the similar fix in the test document):

<?xml version="1.0" encoding="UTF-8"?>

<!-- edited with XML Spy v3.5 NT ( by Anders Conrad (DSL) -->

<!ELEMENT simpledoc (raekke)>

<!ELEMENT raekke (#PCDATA)>

I have the entire reproducible available in case somone is interested.

Any suggestions or commentary would be welcome!

Anders Conrad                  Det Danske Sprog- og Litteraturselskab
IT-redaktør, cand.mag.       Christians Brygge 1
E-mail:             1219 København K
                                        Tlf. 33 13 06 60

View raw message