Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 31310200B44 for ; Thu, 30 Jun 2016 02:43:14 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2FE06160A6E; Thu, 30 Jun 2016 00:43:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 78195160A57 for ; Thu, 30 Jun 2016 02:43:13 +0200 (CEST) Received: (qmail 27172 invoked by uid 500); 30 Jun 2016 00:43:12 -0000 Mailing-List: contact dev-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@manifoldcf.apache.org Delivered-To: mailing list dev@manifoldcf.apache.org Received: (qmail 27161 invoked by uid 99); 30 Jun 2016 00:43:12 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jun 2016 00:43:12 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 1DA5B2C02A1 for ; Thu, 30 Jun 2016 00:43:12 +0000 (UTC) Date: Thu, 30 Jun 2016 00:43:12 +0000 (UTC) From: "Phil (JIRA)" To: dev@manifoldcf.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Reopened] (CONNECTORS-1325) Invalid XML character causing job to abort MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 30 Jun 2016 00:43:14 -0000 [ https://issues.apache.org/jira/browse/CONNECTORS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phil reopened CONNECTORS-1325: ------------------------------ Hi [~daddywri], I'm finding after installing the patch that it does ignore the error. However, the crawler is continuing to attempt to process this document (or at least hte metadata), resulting in the crawler never finishing. Its currently being running for a few days. I tailed the logs for a particular document using the following: {{tail -f manifoldcf.log | grep ""}} Which resulted in the following lines being repeated: {code} DEBUG 2016-06-30 09:59:32,928 (Worker thread '13') sharepoint.SharePointRepository - SharePoint: Finding metadata to include for document/item DEBUG 2016-06-30 09:59:32,946 (Worker thread '13') sharepoint.SPSProxyHelper - SharePoint: In getFieldValues; fieldNames= .... DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') sharepoint.SharePointRepository - SharePoint: Getting version of DEBUG 2016-06-30 09:59:33,100 (Worker thread '27') sharepoint.SharePointRepository - SharePoint: Checking whether to include list item .... ..... .... {code} I've omitted some repository specific details, but let me know if you want any further details. Any idea why this might be happening? Thanks > Invalid XML character causing job to abort > ------------------------------------------ > > Key: CONNECTORS-1325 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1325 > Project: ManifoldCF > Issue Type: Bug > Components: SharePoint connector > Affects Versions: ManifoldCF 2.3 > Reporter: Phil > Assignee: Karl Wright > Priority: Blocker > Fix For: ManifoldCF 2.5 > > Attachments: CONNECTORS-1325.patch > > > The following error is causing the Manifold job to abort, and subsequently the job not being able to finish. > It would be good to have the crawler log this error, but not throw an exception which causes the entire job to stop. > {code} > ERROR 2016-06-21 19:01:54,562 (Worker thread '6') system.WorkerThread - Exception tossed: XML parsing error: Character reference "�" is an invalid XML character. > org.apache.manifoldcf.core.interfaces.ManifoldCFException: XML parsing error: Character reference "�" is an invalid XML character. > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:390) > at org.apache.manifoldcf.core.common.XMLDoc.(XMLDoc.java:286) > at org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getFieldValues(SPSProxyHelper.java:2039) > at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:974) > at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > Caused by: org.xml.sax.SAXParseException; lineNumber: 18; columnNumber: 64; Character reference "�" is an invalid XML character. > at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) > at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at org.apache.manifoldcf.core.common.XMLDoc.init(XMLDoc.java:359) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)