manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Document connector excluding mime type and size - Tika Parser error
Date Tue, 09 Jan 2018 11:39:15 GMT
What version of MCF is this?  That's important to know since Tika has had
problems with this kind of thing in the past and this looks like something
similar.

The problem you are reporting is due to either a missing jar, or a bug in
an internal tika classloader.  But I need to know whether this is a current
bug or not, since we just went to a new Tika version.

Karl


On Tue, Jan 9, 2018 at 4:32 AM, msaunier <msaunier@citya.com> wrote:

> Hello Karl,
>
> I hope you are well today.
>
>
>
> I have 2 problems with ManifoldCF.
>
>
>
> -----------
>
> In **Outputs connectors** with Solr connector. I have add a « Maximum
> document length and I have « Excluded 5 mime types » but it not work. I
> join capture.
>
>
>
> ----------
>
> And in second, I have a **Tika exception** in ManifoldCF. 3 documents are
> blocked :
>
>
>
> FATAL 2018-01-09T10:19:54,992 (Worker thread '5') - Error tossed:
> org.apache.poi.hwmf.record.HwmfFont.getCharSet()Lorg/
> apache/poi/hwmf/record/HwmfFont$WmfCharset;
>
> java.lang.NoSuchMethodError: org.apache.poi.hwmf.record.
> HwmfFont.getCharSet()Lorg/apache/poi/hwmf/record/HwmfFont$WmfCharset;
>
>         at org.apache.tika.parser.microsoft.WMFParser.parse(WMFParser.java:74)
> ~[?:?]
>
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
> ~[?:?]
>
>         at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
> ~[?:?]
>
>         at org.apache.tika.extractor.ParsingEmbeddedDocumentExtract
> or.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:102) ~[?:?]
>
>         at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.
> handleEmbeddedFile(AbstractOOXMLExtractor.java:375) ~[?:?]
>
>         at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.
> handleEmbeddedPart(AbstractOOXMLExtractor.java:260) ~[?:?]
>
>         at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.
> handleEmbeddedParts(AbstractOOXMLExtractor.java:205) ~[?:?]
>
>         at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.
> getXHTML(AbstractOOXMLExtractor.java:142) ~[?:?]
>
>         at org.apache.tika.parser.microsoft.ooxml.
> OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:142) ~[?:?]
>
>         at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
> ~[?:?]
>
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
> ~[?:?]
>
>         at org.apache.manifoldcf.agents.transformation.tika.
> TikaParser.parse(TikaParser.java:74) ~[?:?]
>
>         at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.
> addOrReplaceDocumentWithException(TikaExtractor.java:235) ~[?:?]
>
>         at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithExcept
> ion(IncrementalIngester.java:3226) ~[mcf-agents.jar:?]
>
>         at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> ~[mcf-agents.jar:?]
>
>         at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester$PipelineObjectWithVersions.
> addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
> ~[mcf-agents.jar:?]
>
>         at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester.documentIngest(IncrementalIngester.java:756)
> ~[mcf-agents.jar:?]
>
>         at org.apache.manifoldcf.crawler.system.WorkerThread$
> ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
> ~[mcf-pull-agent.jar:?]
>
>         at org.apache.manifoldcf.crawler.system.WorkerThread$
> ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
> ~[mcf-pull-agent.jar:?]
>
>         at org.apache.manifoldcf.crawler.connectors.sharedrive.
> SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
> ~[?:?]
>
>         at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
>
>
> I need to create an incident ticket?
>
>
>
> ----------
>
>
>
> Thanks for your help.
>
>
>
> Cordialement,
>
>
>
> [image: msaunier]
>
>
>
>
>
>
>

Mime
View raw message