manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <>
Subject [jira] [Commented] (CONNECTORS-1325) Invalid XML character causing job to abort
Date Thu, 23 Jun 2016 06:18:16 GMT


Karl Wright commented on CONNECTORS-1325:

I am regularly astounded at how often Microsoft simply ignores universal specifications like
XML.  Basically they are blowing up the xerces parser, which is pretty close to being the
reference implementation for xml parsers.

I'll have to figure out how best to deal with situations like this.  It's occurring when the
connector is reading metadata information for a document, and therefore many documents you
are reading might be affected.

> Invalid XML character causing job to abort
> ------------------------------------------
>                 Key: CONNECTORS-1325
>                 URL:
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: SharePoint connector
>    Affects Versions: ManifoldCF 2.3
>            Reporter: Phil
>            Priority: Blocker
> The following error is causing the Manifold job to abort, and subsequently the job not
being able to finish.
> It would be good to have the crawler log this error, but not throw an exception which
causes the entire job to stop.
> {code}
> ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - Exception tossed:
XML parsing error: Character reference "&#xD83D" is an invalid XML character.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: Character
reference "&#xD83D" is an invalid XML character.
>         at org.apache.manifoldcf.core.common.XMLDoc.init(
>         at org.apache.manifoldcf.core.common.XMLDoc.<init>(
>         at org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(
>         at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(
>         at
> Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; Character
reference "&#xD83D" is an invalid XML character.
>         at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>         at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
>         at javax.xml.parsers.DocumentBuilder.parse(
>         at org.apache.manifoldcf.core.common.XMLDoc.init(
>         ... 4 more
> {code}

This message was sent by Atlassian JIRA

View raw message