manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Avdeev (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CONNECTORS-1325) Invalid XML character causing job to abort
Date Thu, 13 Oct 2016 08:44:20 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571237#comment-15571237
] 

Konstantin Avdeev edited comment on CONNECTORS-1325 at 10/13/16 8:43 AM:
-------------------------------------------------------------------------

The stackoverflow's thread you mentioned in the second message here, describes the problem
quite well:
this character encoding was introduces in XML 1.1: https://www.w3.org/TR/xml11/#sec-xml11
and a possible solution is: setting the correct header: {code}<?xml version="1.1"?>{code}
I'm afraid, it would take ages to get this fixed by MS.

P.S. the correct XML prologue wont help with emojis, but at least it would solve the issue
with our "record separator" :)

To be honest, I'm not sure what we could do here, I'm not a fan of workarounds. We could leave
it as it is now, but could you probably change the "bad character" warnings to WARN level?
Currently they are shown in DEBUG only, which could be misleading in a production environment.
Thanks!


was (Author: kavdeev):
The stackoverflow's thread you mentioned in the second message here, describes the problem
quite well:
this character encoding was introduces in XML 1.1: https://www.w3.org/TR/xml11/#sec-xml11
and the solution is: setting the correct header: {code}<?xml version="1.1"?>{code}
I'm afraid, it would take ages to get this fixed by MS.

> Invalid XML character causing job to abort
> ------------------------------------------
>
>                 Key: CONNECTORS-1325
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1325
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: SharePoint connector
>    Affects Versions: ManifoldCF 2.3
>            Reporter: Phil
>            Assignee: Karl Wright
>            Priority: Blocker
>             Fix For: ManifoldCF 2.5
>
>         Attachments: CONNECTORS-1325-2.patch, CONNECTORS-1325-3.patch, CONNECTORS-1325.patch,
mcf-bad-ms-char.xml
>
>
> The following error is causing the Manifold job to abort, and subsequently the job not
being able to finish.
> It would be good to have the crawler log this error, but not throw an exception which
causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - Exception tossed:
XML parsing error: Character reference "&#xD83D" is an invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: Character
reference "&#xD83D" is an invalid XML character.
>         at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390)
>         at org.apache.manifoldcf.core.common.XMLDoc.<init>(XMLDoc.java:286)
>         at org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039)
>         at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974)
>         at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; Character
reference "&#xD83D" is an invalid XML character.
>         at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>         at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>         at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
>         at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359)
>         ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message