manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steph van Schalkwyk <st...@remcam.net>
Subject Re: [jira] [Resolved] (CONNECTORS-1518) MCF shutting down when Tika is used
Date Fri, 27 Jul 2018 01:45:12 GMT
Hi Karl
Thank you for the feedback.
Seems as if this is not confined to the ES connector.
I configured a Filesystem to Filesystem job with Allowed Docs and Html
parse transformation.
When I created a Tika parser and put it into the pipeline, MCF started
crashing again:
```agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3816)
at java.base/java.util.BitSet.ensureCapacity(BitSet.java:338)
at java.base/java.util.BitSet.expandTo(BitSet.java:353)
at java.base/java.util.BitSet.set(BitSet.java:448)
at
de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
at
org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
at org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processSheet(ExcelExtractor.java:609)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:392)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:343)
at
org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:92)
at
org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:109)
at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:179)
at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:136)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:319)
at
org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:170)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:184)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
[Thread-479] INFO org.eclipse.jetty.server.ServerConnector - Stopped
ServerConnector@c446b14{HTTP/1.1}{0.0.0.0:8345}
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3816)
at java.base/java.util.BitSet.ensureCapacity(BitSet.java:338)
at java.base/java.util.BitSet.expandTo(BitSet.java:353)
at java.base/java.util.BitSet.set(BitSet.java:448)
at
de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
at
org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
at org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processSheet(ExcelExtractor.java:609)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:392)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:343)
at
org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:92)
at
org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:109)
at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:179)
at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:136)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:319)
at
org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:170)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:184)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
[Thread-479] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped
o.e.j.w.WebAppContext@77e2a6e2
{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-5963547634777478937.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf/manifoldcf_single/././web/war/mcf-api-service.war}
[Thread-479] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped
o.e.j.w.WebAppContext@2237bada
{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-2211982015800983312.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf/manifoldcf_single/././web/war/mcf-authority-service.war}
```
This in the log:
```
WARN 2018-07-26T20:26:03,411 (Worker thread '24') - no processing, mime
type not html
 WARN 2018-07-26T20:26:04,327 (Worker thread '19') - no processing, mime
type not html
 WARN 2018-07-26T20:26:06,057 (Worker thread '21') - no processing, mime
type not html
 WARN 2018-07-26T20:26:06,947 (Worker thread '18') - no processing, mime
type not html
 WARN 2018-07-26T20:26:08,341 (Worker thread '20') - no processing, mime
type not html
 WARN 2018-07-26T20:26:08,703 (Worker thread '17') - no processing, mime
type not html
 WARN 2018-07-26T20:26:08,843 (Worker thread '16') - no processing, mime
type not html
 WARN 2018-07-26T20:26:17,091 (Worker thread '15') - no processing, mime
type not html
 WARN 2018-07-26T20:31:54,154 (Worker thread '0') - no processing, mime
type not html
 WARN 2018-07-26T20:31:54,163 (Worker thread '17') - no processing, mime
type not html
 WARN 2018-07-26T20:31:55,463 (Worker thread '16') - no processing, mime
type not html
 WARN 2018-07-26T20:31:55,846 (Worker thread '1') - no processing, mime
type not html
 WARN 2018-07-26T20:31:56,355 (Worker thread '18') - no processing, mime
type not html
 WARN 2018-07-26T20:31:57,843 (Worker thread '2') - no processing, mime
type not html
 WARN 2018-07-26T20:31:59,085 (Worker thread '3') - no processing, mime
type not html
 WARN 2018-07-26T20:31:59,548 (Worker thread '13') - no processing, mime
type not html
 WARN 2018-07-26T20:32:00,312 (Worker thread '9') - no processing, mime
type not html
 WARN 2018-07-26T20:32:00,625 (Worker thread '12') - no processing, mime
type not html
 WARN 2018-07-26T20:32:00,708 (Worker thread '4') - no processing, mime
type not html
 WARN 2018-07-26T20:32:01,175 (Worker thread '5') - no processing, mime
type not html
 WARN 2018-07-26T20:32:02,916 (Worker thread '6') - no processing, mime
type not html
 WARN 2018-07-26T20:32:16,667 (Worker thread '7') - no processing, mime
type not html
 WARN 2018-07-26T20:32:17,214 (Worker thread '11') - no processing, mime
type not html
 WARN 2018-07-26T20:32:17,962 (Worker thread '10') - no processing, mime
type not html
 WARN 2018-07-26T20:34:42,918 (Worker thread '14') - no processing, mime
type not html
 WARN 2018-07-26T20:34:42,931 (Worker thread '19') - no processing, mime
type not html
 WARN 2018-07-26T20:34:44,908 (Worker thread '12') - no processing, mime
type not html
 WARN 2018-07-26T20:34:44,991 (Worker thread '21') - no processing, mime
type not html
 WARN 2018-07-26T20:34:45,173 (Worker thread '10') - no processing, mime
type not html
 WARN 2018-07-26T20:38:51,831 (Worker thread '1') - no processing, mime
type not html
 WARN 2018-07-26T20:38:51,832 (Worker thread '0') - no processing, mime
type not html
 WARN 2018-07-26T20:41:00,635 (Worker thread '1') - no processing, mime
type not html
 WARN 2018-07-26T20:41:00,646 (Worker thread '0') - no processing, mime
type not html
```



*Steph van Schalkwyk*
Principal, Remcam Search Engines
+1.314.452. <+1+314+452+2896>2896    steph@remcam.net   http://remcam.net
<http://www.remcam.net/> Skype: svanschalkwyk
<https://mail.google.com/mail/u/0/#>
<http://linkedin.com/in/vanschalkwyk>

On Thu, Jul 26, 2018 at 8:00 PM, Karl Wright (JIRA) <jira@apache.org> wrote:

>
>      [ https://issues.apache.org/jira/browse/CONNECTORS-1518?
> page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Karl Wright resolved CONNECTORS-1518.
> -------------------------------------
>     Resolution: Fixed
>
> r1836769
>
> > MCF shutting down when Tika is used
> > -----------------------------------
> >
> >                 Key: CONNECTORS-1518
> >                 URL: https://issues.apache.org/
> jira/browse/CONNECTORS-1518
> >             Project: ManifoldCF
> >          Issue Type: Bug
> >          Components: Tika extractor
> >    Affects Versions: ManifoldCF 2.10
> >         Environment: Centos 7
> > Prior to crash:
> > $free -h
> >  total used free shared buff/cache available
> > Mem: 15G 1.8G 12G 98M 1.1G 13G
> > Swap: 2.0G 0B 2.0G
> > After crash:
> > $free -h
> >  total used free shared buff/cache available
> > Mem: 15G 10G 4.0G 98M 1.1G 4.4G
> > Swap: 2.0G 0B 2.0G
> >
> > {{start-options.env.unix :}}
> > {{-Xss500m}}
> > {{-Xms1g}}
> > {{-Xmx8g}}
> > {{-Dorg.apache.manifoldcf.configfile=./properties.xml}}
> > {{-Dorg.apache.manifoldcf.jettyshutdowntoken=secret_token}}
> > {{-cp}}
> > {{.:./lib/mcf-core.jar:./lib/mcf-agents.jar:./lib/mcf-pull-
> agent.jar:./lib/mcf-ui-core.jar:./lib/mcf-jetty-runner.
> jar:./lib/jetty-continuation-9.2.3.v20140905.jar:./lib/
> jetty-http-9.2.3.v20140905.jar:./lib/jetty-io-9.2.3.
> v20140905.jar:./lib/jetty-jndi-9.2.3.v20140905.jar:./
> lib/jetty-jsp-jdt-2.3.3.jar:./lib/jetty-plus-9.2.3.
> v20140905.jar:./lib/jetty-schemas-3.1.M0.jar:./lib/jetty-security-9.2.3.
> v20140905.jar:./lib/jetty-server-9.2.3.v20140905.jar:./
> lib/jetty-servlet-9.2.3.v20140905.jar:./lib/jetty-
> util-9.2.3.v20140905.jar:./lib/jetty-webapp-9.2.3.
> v20140905.jar:./lib/jetty-xml-9.2.3.v20140905.jar:./lib/
> hsqldb-2.3.2.jar:./lib/postgresql-42.1.3.jar:./lib/
> commons-codec-1.10.jar:./lib/commons-collections-3.2.1.jar:
> ./lib/commons-collections4-4.1.jar:./lib/commons-discovery-
> 0.5.jar:./lib/commons-el-1.0.jar:./lib/commons-exec-1.3.
> jar:./lib/commons-fileupload-1.2.2.jar:./lib/commons-io-2.
> 5.jar:./lib/commons-lang-2.6.jar:./lib/commons-lang3-3.6.
> jar:./lib/commons-logging-1.2.jar:./lib/ecj-4.3.1.jar:./lib/
> gson-2.8.0.jar:./lib/guava-21.0.jar:./lib/httpclient-4.5.3.
> jar:./lib/httpcore-4.4.6.jar:./lib/jasper-6.0.35.jar:./lib/
> jasper-el-6.0.35.jar:./lib/javax.servlet-api-3.1.0.jar:./
> lib/jna-4.1.0.jar:./lib/jna-platform-4.1.0.jar:./lib/json-
> simple-1.1.1.jar:./lib/jsp-api-2.1-glassfish-2.1.
> v20091210.jar:./lib/juli-6.0.35.jar:./lib/log4j-1.2-api-2.
> 4.1.jar:./lib/log4j-api-2.4.1.jar:./lib/log4j-core-2.4.1.
> jar:./lib/mail-1.4.5.jar:./lib/serializer-2.7.1.jar:./
> lib/slf4j-api-1.7.24.jar:./lib/slf4j-simple-1.7.24.jar:./
> lib/velocity-1.7.jar:./lib/xalan-2.7.1.jar:./lib/
> xercesImpl-2.10.0.jar:./lib/xml-apis-1.4.01.jar:./lib/
> zookeeper-3.4.10.jar:}}
> >            Reporter: Steph van Schalkwyk
> >            Assignee: Karl Wright
> >            Priority: Major
> >             Fix For: ManifoldCF 2.11
> >
> >         Attachments: CONNECTORS-1518.patch
> >
> >
> >   ```Jul 26, 2018 1:21:51 PM org.apache.tika.config.InitializableProblemHandler$3
> handleInitializableProblem
> >  WARNING: org.xerial's sqlite-jdbc is not loaded.
> >  Please provide the jar on your classpath to parse sqlite files.
> >  See tika-parsers/pom.xml for the correct version.
> >  agents process ran out of memory - shutting down
> >  java.lang.OutOfMemoryError: Java heap space
> >  \{{ {{ at java.base/java.util.Arrays.copyOf(Arrays.java:3816)}}}}
> >  \{{ {{ at java.base/java.util.BitSet.ensureCapacity(BitSet.java:
> 338)}}}}
> >  \{{ {{ at java.base/java.util.BitSet.expandTo(BitSet.java:353)}}}}
> >  \{{ {{ at java.base/java.util.BitSet.set(BitSet.java:448)}}}}
> >  \{{ {{ at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.
> characters(BoilerpipeHTMLContentHandler.java:267)}}}}
> >  \{{ {{ at org.apache.tika.parser.html.BoilerpipeContentHandler.
> characters(BoilerpipeContentHandler.java:155)}}}}
> >  \{{ {{ at org.apache.tika.sax.ContentHandlerDecorator.characters(
> ContentHandlerDecorator.java:146)}}}}
> >  \{{ {{ at org.apache.tika.sax.SecureContentHandler.characters(
> SecureContentHandler.java:270)}}}}
> >  \{{ {{ at org.apache.tika.sax.ContentHandlerDecorator.characters(
> ContentHandlerDecorator.java:146)}}}}
> >  \{{ {{ at org.apache.tika.sax.ContentHandlerDecorator.characters(
> ContentHandlerDecorator.java:146)}}}}
> >  \{{ {{ at org.apache.tika.sax.ContentHandlerDecorator.characters(
> ContentHandlerDecorator.java:146)}}}}
> >  \{{ {{ at org.apache.tika.sax.SafeContentHandler.access$001(
> SafeContentHandler.java:46)}}}}
> >  \{{ {{ at org.apache.tika.sax.SafeContentHandler$1.write(
> SafeContentHandler.java:82)}}}}
> >  \{{ {{ at org.apache.tika.sax.SafeContentHandler.filter(
> SafeContentHandler.java:140)}}}}
> >  \{{ {{ at org.apache.tika.sax.SafeContentHandler.characters(
> SafeContentHandler.java:287)}}}}
> >  \{{ {{ at org.apache.tika.sax.XHTMLContentHandler.characters(
> XHTMLContentHandler.java:279)}}}}
> >  \{{ {{ at org.apache.tika.sax.XHTMLContentHandler.characters(
> XHTMLContentHandler.java:306)}}}}
> >  \{{ {{ at org.apache.tika.parser.microsoft.TextCell.render(
> TextCell.java:34)}}}}
> >  \{{ {{ at org.apache.tika.parser.microsoft.ExcelExtractor$
> TikaHSSFListener.processSheet(ExcelExtractor.java:609)}}}}
> >  \{{ {{ at org.apache.tika.parser.microsoft.ExcelExtractor$
> TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:392)}}}}
> >  \{{ {{ at org.apache.tika.parser.microsoft.ExcelExtractor$
> TikaHSSFListener.processRecord(ExcelExtractor.java:343)}}}}
> >  \{{ {{ at org.apache.poi.hssf.eventusermodel.
> FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.
> java:92)}}}}
> >  \{{ {{ at org.apache.poi.hssf.eventusermodel.HSSFRequest.
> processRecord(HSSFRequest.java:109)}}}}
> >  \{{ {{ at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.
> genericProcessEvents(HSSFEventFactory.java:179)}}}}
> >  \{{ {{ at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.
> processEvents(HSSFEventFactory.java:136)}}}}
> >  \{{ {{ at org.apache.tika.parser.microsoft.ExcelExtractor$
> TikaHSSFListener.processFile(ExcelExtractor.java:319)}}}}
> >  \{{ {{ at org.apache.tika.parser.microsoft.ExcelExtractor.
> parse(ExcelExtractor.java:170)}}}}
> >  \{{ {{ at org.apache.tika.parser.microsoft.OfficeParser.parse(
> OfficeParser.java:184)}}}}
> >  \{{ {{ at org.apache.tika.parser.microsoft.OfficeParser.parse(
> OfficeParser.java:132)}}}}
> >  \{{ {{ at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:280)}}}}
> >  \{{ {{ at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:280)}}}}
> >  \{{ {{ at org.apache.tika.parser.AutoDetectParser.parse(
> AutoDetectParser.java:143)}}}}
> >  {{ [Thread-475] INFO org.eclipse.jetty.server.ServerConnector -
> Stopped ServerConnector@37095ded\{HTTP/1.1}{{
> > {0.0.0.0:8345}
> > }}}}
> >  {{ {{[Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler
> - Stopped o.e.j.w.WebAppContext@5a6d5a8f
> > {/mcf-api-service,[file:/tmp/jetty-0.0.0.0-8345-mcf-api-
> service.war-_mcf-api-service-any-14189461872304124764.dir/
> webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-
> api-service.war-_mcf-api-service-any-14189461872304124764.dir/
> webapp/,UNAVAILABLE]}
> > }}{{
> > {/opt/manifoldcf/manifoldcf_single/././web/war/mcf-api-service.war}}}}}
> >  {{ [Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped o.e.j.w.WebAppContext@6979efad{/mcf-authority-service,[file:
> /tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-
> authority-service-any-11619445383548662284.dir/
> webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-
> authority-service.war-_mcf-authority-service-any-11619445383548662284.dir/
> webapp/,UNAVAILABLE]}\{/opt/manifoldcf/manifoldcf_single/.
> /./web/war/mcf-authority-service.war}}}
> >  2018-07-26 13:22:47,170 qtp2061226112-492 FATAL Unable to register
> shutdown hook because JVM is shutting down. java.lang.IllegalStateException:
> Cannot add new shutdown hook as this is not started. Current state: STOPPED
> >  \{{ {{ at org.apache.logging.log4j.core.util.
> DefaultShutdownCallbackRegistry.addShutdownCallback(
> DefaultShutdownCallbackRegistry.java:113)}}}}
> >  \{{ {{ at org.apache.logging.log4j.core.impl.Log4jContextFactory.
> addShutdownCallback(Log4jContextFactory.java:271)}}}}
> >  \{{ {{ at org.apache.logging.log4j.core.LoggerContext.
> setUpShutdownHook(LoggerContext.java:256)}}}}
> >  \{{ {{ at org.apache.logging.log4j.core.LoggerContext.start(
> LoggerContext.java:216)}}}}
> >  \{{ {{ at org.apache.logging.log4j.core.impl.Log4jContextFactory.
> getContext(Log4jContextFactory.java:146)}}}}
> >  \{{ {{ at org.apache.logging.log4j.core.impl.Log4jContextFactory.
> getContext(Log4jContextFactory.java:41)}}}}
> >  \{{ {{ at org.apache.logging.log4j.LogManager.getContext(
> LogManager.java:270)}}}}
> >  \{{ {{ at org.apache.log4j.Logger$PrivateManager.getContext(
> Logger.java:59)}}}}
> >  \{{ {{ at org.apache.log4j.Logger.getLogger(Logger.java:37)}}}}
> >  \{{ {{ at org.apache.velocity.runtime.log.Log4JLogChute.init(
> Log4JLogChute.java:72)}}}}
> >  \{{ {{ at org.apache.velocity.runtime.log.LogManager.createLogChute(
> LogManager.java:157)}}}}
> >  \{{ {{ at org.apache.velocity.runtime.log.LogManager.updateLog(
> LogManager.java:269)}}}}
> >  \{{ {{ at org.apache.velocity.runtime.RuntimeInstance.initializeLog(
> RuntimeInstance.java:871)}}}}
> >  \{{ {{ at org.apache.velocity.runtime.RuntimeInstance.init(
> RuntimeInstance.java:262)}}}}
> >  \{{ {{ at org.apache.velocity.runtime.RuntimeInstance.
> requireInitialization(RuntimeInstance.java:302)}}}}
> >  \{{ {{ at org.apache.velocity.runtime.RuntimeInstance.getTemplate(
> RuntimeInstance.java:1531)}}}}
> >  \{{ {{ at org.apache.velocity.app.VelocityEngine.mergeTemplate(
> VelocityEngine.java:343)}}}}
> >  \{{ {{ at org.apache.manifoldcf.ui.i18n.Messages.
> outputResourceWithVelocity(Messages.java:159)}}}}
> >  \{{ {{ at org.apache.manifoldcf.agents.transformation.tika.Messages.
> outputResourceWithVelocity(Messages.java:136)}}}}
> >  \{{ {{ at org.apache.manifoldcf.agents.transformation.tika.
> TikaExtractor.outputSpecificationBody(TikaExtractor.java:544)}}}}
> >  \{{ {{ at org.apache.jsp.editjob_jsp._jspService(editjob_jsp.java:
> 3002)}}}}
> >  \{{ {{ at org.apache.jasper.runtime.HttpJspBase.service(
> HttpJspBase.java:70)}}}}
> >  \{{ {{ at javax.servlet.http.HttpServlet.service(
> HttpServlet.java:790)}}}}
> >  \{{ {{ at org.apache.jasper.servlet.JspServletWrapper.service(
> JspServletWrapper.java:388)}}}}
> >  \{{ {{ at org.apache.jasper.servlet.JspServlet.serviceJspFile(
> JspServlet.java:313)}}}}
> >  \{{ {{ at org.apache.jasper.servlet.JspServlet.service(JspServlet.
> java:260)}}}}
> >  \{{ {{ at javax.servlet.http.HttpServlet.service(
> HttpServlet.java:790)}}}}
> >  \{{ {{ at org.eclipse.jetty.servlet.ServletHolder.handle(
> ServletHolder.java:769)}}}}
> >  \{{ {{ at org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:585)}}}}
> >  \{{ {{ at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:143)}}}}
> >  \{{ {{ at org.eclipse.jetty.security.SecurityHandler.handle(
> SecurityHandler.java:577)}}}}
> >  \{{ {{ at org.eclipse.jetty.server.session.SessionHandler.
> doHandle(SessionHandler.java:223)}}}}
> >  \{{ {{ at org.eclipse.jetty.server.handler.ContextHandler.
> doHandle(ContextHandler.java:1125)}}}}
> >  \{{ {{ at org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:515)}}}}
> >  \{{ {{ at org.eclipse.jetty.server.session.SessionHandler.
> doScope(SessionHandler.java:185)}}}}
> >  \{{ {{ at org.eclipse.jetty.server.handler.ContextHandler.
> doScope(ContextHandler.java:1059)}}}}
> >  \{{ {{ at org.eclipse.jetty.server.handler.ScopedHandler.handle(
> ScopedHandler.java:141)}}}}
> >  \{{ {{ at org.eclipse.jetty.server.handler.ContextHandlerCollection.
> handle(ContextHandlerCollection.java:215)}}}}
> >  \{{ {{ at org.eclipse.jetty.server.handler.HandlerList.handle(
> HandlerList.java:52)}}}}
> >  \{{ {{ at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
> HandlerWrapper.java:97)}}}}
> >  \{{ {{ at org.eclipse.jetty.server.Server.handle(Server.java:497)}}}}
> >  \{{ {{ at org.eclipse.jetty.server.HttpChannel.handle(
> HttpChannel.java:311)}}}}
> >  \{{ {{ at org.eclipse.jetty.server.HttpConnection.onFillable(
> HttpConnection.java:248)}}}}
> >  \{{ {{ at org.eclipse.jetty.io.AbstractConnection$2.run(
> AbstractConnection.java:540)}}}}
> >  \{{ {{ at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
> QueuedThreadPool.java:610)}}}}
> >  \{{ {{ at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(
> QueuedThreadPool.java:539)}}}}
> >  \{{ {{ at java.base/java.lang.Thread.run(Thread.java:844)}}}}[Worker
> thread '35'] WARN org.apache.tika.parser.microsoft.AbstractPOIFSExtractor
> - Ignoring unexpected exception while parsing summary entry
> SummaryInformation
> >  java.lang.RuntimeException: java.nio.channels.
> ClosedByInterruptException
> >  \{{ {{ at org.apache.poi.poifs.filesystem.NPOIFSStream$
> StreamBlockByteBufferIterator.<init>(NPOIFSStream.java:151)}}}}
> >  \{{ {{ at org.apache.poi.poifs.filesystem.NPOIFSStream.
> getBlockIterator(NPOIFSStream.java:95)}}}}
> >  \{{ {{ at org.apache.poi.poifs.filesystem.NPOIFSDocument.
> getBlockIterator(NPOIFSDocument.java:179)}}}}
> >  \{{ {{ at org.apache.poi.poifs.filesystem.NDocumentInputStream.<init>(
> NDocumentInputStream.java:82)}}}}
> >  \{{ {{ at org.apache.poi.poifs.filesystem.DocumentInputStream.<init>(
> DocumentInputStream.java:65)}}}}
> >  \{{ {{ at org.apache.tika.parser.microsoft.SummaryExtractor.
> parseSummaryEntryIfExists(SummaryExtractor.java:83)}}}}
> >  \{{ {{ at org.apache.tika.parser.microsoft.SummaryExtractor.
> parseSummaries(SummaryExtractor.java:73)}}}}
> >  \{{ {{ at org.apache.tika.parser.microsoft.OfficeParser.parse(
> OfficeParser.java:156)}}}}
> >  \{{ {{ at org.apache.tika.parser.microsoft.OfficeParser.parse(
> OfficeParser.java:132)}}}}
> >  \{{ {{ at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:280)}}}}
> >  \{{ {{ at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:280)}}}}
> >  \{{ {{ at org.apache.tika.parser.AutoDetectParser.parse(
> AutoDetectParser.java:143)}}}}
> >  \{{ {{ at org.apache.manifoldcf.agents.transformation.tika.
> TikaParser.parse(TikaParser.java:74)}}}}
> >  \{{ {{ at org.apache.manifoldcf.agents.transformation.tika.
> TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)}}}
> }
> >  \{{ {{ at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithExcept
> ion(IncrementalIngester.java:3226)}}}}
> >  \{{ {{ at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester$PipelineAddFanout.sendDocument(
> IncrementalIngester.java:3077)}}}}
> >  \{{ {{ at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester$PipelineObjectWithVersions.
> addOrReplaceDocumentWithException(IncrementalIngester.java:2708)}}}}
> >  \{{ {{ at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester.documentIngest(IncrementalIngester.java:756)}}}}
> >  \{{ {{ at org.apache.manifoldcf.crawler.system.WorkerThread$
> ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)}}}}
> >  \{{ {{ at org.apache.manifoldcf.crawler.system.WorkerThread$
> ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)}}}}
> >  \{{ {{ at org.apache.manifoldcf.crawler.connectors.filesystem.
> FileConnector.processDocuments(FileConnector.java:448)}}}}
> >  \{{ {{ at org.apache.manifoldcf.crawler.system.WorkerThread.run(
> WorkerThread.java:399)}}}}
> >  Caused by: java.nio.channels.ClosedByInterruptException
> >  \{{ {{ at java.base/java.nio.channels.spi.AbstractInterruptibleChannel.
> end(AbstractInterruptibleChannel.java:199)}}}}
> >  \{{ {{ at java.base/sun.nio.ch.FileChannelImpl.size(
> FileChannelImpl.java:388)}}}}
> >  \{{ {{ at org.apache.poi.poifs.nio.FileBackedDataSource.size(
> FileBackedDataSource.java:137)}}}}
> >  \{{ {{ at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.
> getChainLoopDetector(NPOIFSFileSystem.java:627)}}}}
> >  \{{ {{ at org.apache.poi.poifs.filesystem.NPOIFSStream$
> StreamBlockByteBufferIterator.<init>(NPOIFSStream.java:149)}}}}
> >  \{{ {{ ... 21 more}}}}
> >  [Worker thread '35'] WARN org.apache.tika.parser.microsoft.AbstractPOIFSExtractor
> - Ignoring unexpected exception while parsing summary entry
> DocumentSummaryInformation
> >  java.lang.RuntimeException: java.nio.channels.ClosedChannelException
> >  \{{ {{ at org.apache.poi.poifs.filesystem.NPOIFSStream$
> StreamBlockByteBufferIterator.<init>(NPOIFSStream.java:151)}}}}
> >  \{{ {{ at org.apache.poi.poifs.filesystem.NPOIFSStream.
> getBlockIterator(NPOIFSStream.java:95)}}}}
> >  \{{ {{ at org.apache.poi.poifs.filesystem.NPOIFSMiniStore.
> getBlockAt(NPOIFSMiniStore.java:67)}}}}
> >  \{{ {{ at org.apache.poi.poifs.filesystem.NPOIFSStream$
> StreamBlockByteBufferIterator.next(NPOIFSStream.java:169)}}}}
> >  \{{ {{ at org.apache.poi.poifs.filesystem.NPOIFSStream$
> StreamBlockByteBufferIterator.next(NPOIFSStream.java:142)}}}}
> >  \{{ {{ at org.apache.poi.poifs.filesystem.NDocumentInputStream.
> readFully(NDocumentInputStream.java:264)}}}}
> >  \{{ {{ at org.apache.poi.poifs.filesystem.NDocumentInputStream.read(
> NDocumentInputStream.java:162)}}}}
> >  \{{ {{ at org.apache.poi.poifs.filesystem.DocumentInputStream.read(
> DocumentInputStream.java:127)}}}}
> >  \{{ {{ at org.apache.poi.util.BoundedInputStream.read(
> BoundedInputStream.java:121)}}}}
> >  \{{ {{ at org.apache.poi.util.BoundedInputStream.read(
> BoundedInputStream.java:103)}}}}
> >  \{{ {{ at org.apache.poi.util.IOUtils.copy(IOUtils.java:312)}}}}
> >  \{{ {{ at org.apache.poi.util.IOUtils.peekFirstNBytes(IOUtils.java:
> 70)}}}}
> >  \{{ {{ at org.apache.poi.hpsf.PropertySet.isPropertySetStream(
> PropertySet.java:393)}}}}
> >  \{{ {{ at org.apache.poi.hpsf.PropertySet.<init>(
> PropertySet.java:191)}}}}
> >  \{{ {{ at org.apache.tika.parser.microsoft.SummaryExtractor.
> parseSummaryEntryIfExists(SummaryExtractor.java:83)}}}}
> >  \{{ {{ at org.apache.tika.parser.microsoft.SummaryExtractor.
> parseSummaries(SummaryExtractor.java:74)}}}}
> >  \{{ {{ at org.apache.tika.parser.microsoft.OfficeParser.parse(
> OfficeParser.java:156)}}}}
> >  \{{ {{ at org.apache.tika.parser.microsoft.OfficeParser.parse(
> OfficeParser.java:132)}}}}
> >  \{{ {{ at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:280)}}}}
> >  \{{ {{ at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:280)}}}}
> >  \{{ {{ at org.apache.tika.parser.AutoDetectParser.parse(
> AutoDetectParser.java:143)}}}}
> >  \{{ {{ at org.apache.manifoldcf.agents.transformation.tika.
> TikaParser.parse(TikaParser.java:74)}}}}
> >  \{{ {{ at org.apache.manifoldcf.agents.transformation.tika.
> TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)}}}
> }
> >  \{{ {{ at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithExcept
> ion(IncrementalIngester.java:3226)}}}}
> >  \{{ {{ at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester$PipelineAddFanout.sendDocument(
> IncrementalIngester.java:3077)}}}}
> >  \{{ {{ at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester$PipelineObjectWithVersions.
> addOrReplaceDocumentWithException(IncrementalIngester.java:2708)}}}}
> >  \{{ {{ at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester.documentIngest(IncrementalIngester.java:756)}}}}
> >  \{{ {{ at org.apache.manifoldcf.crawler.system.WorkerThread$
> ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)}}}}
> >  \{{ {{ at org.apache.manifoldcf.crawler.system.WorkerThread$
> ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)}}}}
> >  \{{ {{ at org.apache.manifoldcf.crawler.connectors.filesystem.
> FileConnector.processDocuments(FileConnector.java:448)}}}}
> >  \{{ {{ at org.apache.manifoldcf.crawler.system.WorkerThread.run(
> WorkerThread.java:399)}}}}
> >  Caused by: java.nio.channels.ClosedChannelException
> >  \{{ {{ at java.base/sun.nio.ch.FileChannelImpl.ensureOpen(
> FileChannelImpl.java:158)}}}}
> >  \{{ {{ at java.base/sun.nio.ch.FileChannelImpl.size(
> FileChannelImpl.java:373)}}}}
> >  \{{ {{ at org.apache.poi.poifs.nio.FileBackedDataSource.size(
> FileBackedDataSource.java:137)}}}}
> >  \{{ {{ at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.
> getChainLoopDetector(NPOIFSFileSystem.java:627)}}}}
> >  \{{ {{ at org.apache.poi.poifs.filesystem.NPOIFSStream$
> StreamBlockByteBufferIterator.<init>(NPOIFSStream.java:149)}}}}
> >  \{{ {{ ... 30 more}}}} ```}}{{Following up:When these exceptions occur,
> the heap runs out:13:39:39.856 [Worker thread '49'] WARN
> org.apache.manifoldcf.jobs - Service interruption reported for job
> 1532551209410 connection 'file': IO exception: null
> >  13:39:39.970 [Worker thread '43'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:40.415 [Worker thread '34'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:40.469 [Worker thread '1'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:43.739 [Worker thread '32'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:44.697 [Worker thread '43'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:45.756 [Worker thread '33'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:45.775 [Worker thread '36'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:46.751 [Worker thread '35'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:46.753 [Worker thread '40'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:47.536 [Worker thread '45'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:48.734 [Worker thread '44'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:50.922 [Worker thread '30'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:54.930 [Worker thread '28'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:40:33.660 [Worker thread '29'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  agents process ran out of memory - shutting down
> >  java.lang.OutOfMemoryError: Java heap space
> >  \{{ at java.base/java.lang.StringLatin1.newString(
> StringLatin1.java:549)}}
> >  \{{ at java.base/java.lang.StringBuilder.toString(
> StringBuilder.java:415)}}
> >  \{{ at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(
> BoilerpipeHTMLContentHandler.java:341)}}
> >  \{{ at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(
> BoilerpipeHTMLContentHandler.java:198)}}
> >  \{{ at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(
> BoilerpipeContentHandler.java:155)}}
> >  \{{ at org.apache.tika.sax.ContentHandlerDecorator.characters(
> ContentHandlerDecorator.java:146)}}
> >  \{{ at org.apache.tika.sax.SecureContentHandler.characters(
> SecureContentHandler.java:270)}}
> >  \{{ at org.apache.tika.sax.ContentHandlerDecorator.characters(
> ContentHandlerDecorator.java:146)}}
> >  \{{ at org.apache.tika.sax.ContentHandlerDecorator.characters(
> ContentHandlerDecorator.java:146)}}
> >  \{{ at org.apache.tika.sax.ContentHandlerDecorator.characters(
> ContentHandlerDecorator.java:146)}}
> >  \{{ at org.apache.tika.sax.SafeContentHandler.access$001(
> SafeContentHandler.java:46)}}
> >  \{{ at org.apache.tika.sax.SafeContentHandler$1.write(
> SafeContentHandler.java:82)}}
> >  \{{ at org.apache.tika.sax.SafeContentHandler.filter(
> SafeContentHandler.java:140)}}
> >  \{{ at org.apache.tika.sax.SafeContentHandler.characters(
> SafeContentHandler.java:287)}}
> >  \{{ at org.apache.tika.sax.XHTMLContentHandler.characters(
> XHTMLContentHandler.java:279)}}
> >  \{{ at org.apache.tika.sax.XHTMLContentHandler.characters(
> XHTMLContentHandler.java:306)}}
> >  \{{ at org.apache.tika.parser.microsoft.TextCell.render(
> TextCell.java:34)}}
> >  \{{ at org.apache.tika.parser.microsoft.ExcelExtractor$
> TikaHSSFListener.processSheet(ExcelExtractor.java:609)}}
> >  \{{ at org.apache.tika.parser.microsoft.ExcelExtractor$
> TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:392)}}
> >  \{{ at org.apache.tika.parser.microsoft.ExcelExtractor$
> TikaHSSFListener.processRecord(ExcelExtractor.java:343)}}
> >  \{{ at org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.
> processRecord(FormatTrackingHSSFListener.java:92)}}
> >  \{{ at org.apache.poi.hssf.eventusermodel.HSSFRequest.
> processRecord(HSSFRequest.java:109)}}
> >  \{{ at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.
> genericProcessEvents(HSSFEventFactory.java:179)}}
> >  \{{ at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.
> processEvents(HSSFEventFactory.java:136)}}
> >  \{{ at org.apache.tika.parser.microsoft.ExcelExtractor$
> TikaHSSFListener.processFile(ExcelExtractor.java:319)}}
> >  \{{ at org.apache.tika.parser.microsoft.ExcelExtractor.
> parse(ExcelExtractor.java:170)}}
> >  \{{ at org.apache.tika.parser.microsoft.OfficeParser.parse(
> OfficeParser.java:184)}}
> >  \{{ at org.apache.tika.parser.microsoft.OfficeParser.parse(
> OfficeParser.java:132)}}
> >  \{{ at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:280)}}
> >  \{{ at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:280)}}
> >  \{{ at org.apache.tika.parser.AutoDetectParser.parse(
> AutoDetectParser.java:143)}}
> >  \{{ at org.apache.manifoldcf.agents.transformation.tika.
> TikaParser.parse(TikaParser.java:74)}}
> >  agents process ran out of memory - shutting down
> >  java.lang.OutOfMemoryError: Java heap space
> >  \{{ at java.base/java.util.Arrays.copyOf(Arrays.java:3744)}}
> >  \{{ at java.base/java.lang.AbstractStringBuilder.
> ensureCapacityInternal(AbstractStringBuilder.java:146)}}
> >  \{{ at java.base/java.lang.AbstractStringBuilder.append(
> AbstractStringBuilder.java:531)}}
> >  \{{ at java.base/java.lang.AbstractStringBuilder.append(
> AbstractStringBuilder.java:550)}}
> >  \{{ at java.base/java.lang.StringBuilder.append(
> StringBuilder.java:171)}}
> >  \{{ at java.base/java.util.regex.Matcher.appendReplacement(
> Matcher.java:1002)}}
> >  \{{ at java.base/java.util.regex.Matcher.replaceAll(Matcher.
> java:1181)}}
> >  \{{ at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(
> UnicodeTokenizer.java:40)}}
> >  \{{ at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(
> BoilerpipeHTMLContentHandler.java:296)}}
> >  \{{ at de.l3s.boilerpipe.sax.CommonTagActions$3.end(
> CommonTagActions.java:143)}}
> >  \{{ at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.endElement(
> BoilerpipeHTMLContentHandler.java:183)}}
> >  \{{ at org.apache.tika.parser.html.BoilerpipeContentHandler.endElement(
> BoilerpipeContentHandler.java:175)}}
> >  \{{ at org.apache.tika.sax.ContentHandlerDecorator.endElement(
> ContentHandlerDecorator.java:136)}}
> >  \{{ at org.apache.tika.sax.SecureContentHandler.endElement(
> SecureContentHandler.java:256)}}
> >  \{{ at org.apache.tika.sax.ContentHandlerDecorator.endElement(
> ContentHandlerDecorator.java:136)}}
> >  \{{ at org.apache.tika.sax.ContentHandlerDecorator.endElement(
> ContentHandlerDecorator.java:136)}}
> >  \{{ at org.apache.tika.sax.ContentHandlerDecorator.endElement(
> ContentHandlerDecorator.java:136)}}
> >  \{{ at org.apache.tika.sax.SafeContentHandler.endElement(
> SafeContentHandler.java:273)}}
> >  \{{ at org.apache.tika.sax.XHTMLContentHandler.endDocument(
> XHTMLContentHandler.java:224)}}
> >  \{{ at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:109)}}
> >  \{{ at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:280)}}
> >  \{{ at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:280)}}
> >  \{{ at org.apache.tika.parser.AutoDetectParser.parse(
> AutoDetectParser.java:143)}}
> >  \{{ at org.apache.manifoldcf.agents.transformation.tika.
> TikaParser.parse(TikaParser.java:74)}}
> >  \{{ at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.
> addOrReplaceDocumentWithException(TikaExtractor.java:235)}}
> >  \{{ at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithExcept
> ion(IncrementalIngester.java:3226)}}
> >  \{{ at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester$PipelineAddFanout.sendDocument(
> IncrementalIngester.java:3077)}}
> >  \{{ at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester$PipelineObjectWithVersions.
> addOrReplaceDocumentWithException(IncrementalIngester.java:2708)}}
> >  \{{ at org.apache.manifoldcf.agents.incrementalingest.
> IncrementalIngester.documentIngest(IncrementalIngester.java:756)}}
> >  \{{ at org.apache.manifoldcf.crawler.system.WorkerThread$
> ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)}}
> >  \{{ at org.apache.manifoldcf.crawler.system.WorkerThread$
> ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)}}
> >  \{{ at org.apache.manifoldcf.crawler.connectors.filesystem.
> FileConnector.processDocuments(FileConnector.java:448)}}
> >  13:40:33.995 [Worker thread '42'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  [Thread-475] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@5d235104\{HTTP/1.1}{0.0.0.0:8345}
> >  {{[Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped o.e.j.w.WebAppContext@6105f8a3\{/mcf-api-service,[file:/tmp/
> jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-
> any-9896962439762567079.dir/webapp/,UNAVAILABLE|file:///
> tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-
> 9896962439762567079.dir/webapp/,UNAVAILABLE]}{/opt/
> manifoldcf/manifoldcf_single/././web/war/mcf-api-service.war}
> >
> >  }}
> >  {{[Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped o.e.j.w.WebAppContext@12365c88\{/mcf-authority-service,[
> file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_
> mcf-authority-service-any-3954308360064638561.dir/
> webapp/,UNAVAILABLE|file:///tmp/jetty-0.0.0.0-8345-mcf-
> authority-service.war-_mcf-authority-service-any-3954308360064638561.dir/
> webapp/,UNAVAILABLE]}
> >  \{/opt/manifoldcf/manifoldcf_single/././web/war/mcf-
> authority-service.war}
> >
> >  }}
> >
> >
> >
> >  Follow-up: When these issues occur, the jvm runs out of space:
> >
> >  13:39:39.856 [Worker thread '49'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:39.970 [Worker thread '43'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:40.415 [Worker thread '34'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:40.469 [Worker thread '1'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:43.739 [Worker thread '32'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:44.697 [Worker thread '43'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:45.756 [Worker thread '33'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:45.775 [Worker thread '36'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:46.751 [Worker thread '35'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:46.753 [Worker thread '40'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:47.536 [Worker thread '45'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:48.734 [Worker thread '44'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:50.922 [Worker thread '30'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:39:54.930 [Worker thread '28'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  13:40:33.660 [Worker thread '29'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  agents process ran out of memory - shutting down
> >  java.lang.OutOfMemoryError: Java heap space
> >  at java.base/java.lang.StringLatin1.newString(StringLatin1.java:549)
> >  at java.base/java.lang.StringBuilder.toString(StringBuilder.java:415)
> >  at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(
> BoilerpipeHTMLContentHandler.java:341)
> >  at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(
> BoilerpipeHTMLContentHandler.java:198)
> >  at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(
> BoilerpipeContentHandler.java:155)
> >  at org.apache.tika.sax.ContentHandlerDecorator.characters(
> ContentHandlerDecorator.java:146)
> >  at org.apache.tika.sax.SecureContentHandler.characters(
> SecureContentHandler.java:270)
> >  at org.apache.tika.sax.ContentHandlerDecorator.characters(
> ContentHandlerDecorator.java:146)
> >  at org.apache.tika.sax.ContentHandlerDecorator.characters(
> ContentHandlerDecorator.java:146)
> >  at org.apache.tika.sax.ContentHandlerDecorator.characters(
> ContentHandlerDecorator.java:146)
> >  at org.apache.tika.sax.SafeContentHandler.access$001(
> SafeContentHandler.java:46)
> >  at org.apache.tika.sax.SafeContentHandler$1.write(
> SafeContentHandler.java:82)
> >  at org.apache.tika.sax.SafeContentHandler.filter(
> SafeContentHandler.java:140)
> >  at org.apache.tika.sax.SafeContentHandler.characters(
> SafeContentHandler.java:287)
> >  at org.apache.tika.sax.XHTMLContentHandler.characters(
> XHTMLContentHandler.java:279)
> >  at org.apache.tika.sax.XHTMLContentHandler.characters(
> XHTMLContentHandler.java:306)
> >  at org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34)
> >  at org.apache.tika.parser.microsoft.ExcelExtractor$
> TikaHSSFListener.processSheet(ExcelExtractor.java:609)
> >  at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.
> internalProcessRecord(ExcelExtractor.java:392)
> >  at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.
> processRecord(ExcelExtractor.java:343)
> >  at org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.
> processRecord(FormatTrackingHSSFListener.java:92)
> >  at org.apache.poi.hssf.eventusermodel.HSSFRequest.
> processRecord(HSSFRequest.java:109)
> >  at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.
> genericProcessEvents(HSSFEventFactory.java:179)
> >  at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(
> HSSFEventFactory.java:136)
> >  at org.apache.tika.parser.microsoft.ExcelExtractor$
> TikaHSSFListener.processFile(ExcelExtractor.java:319)
> >  at org.apache.tika.parser.microsoft.ExcelExtractor.
> parse(ExcelExtractor.java:170)
> >  at org.apache.tika.parser.microsoft.OfficeParser.parse(
> OfficeParser.java:184)
> >  at org.apache.tika.parser.microsoft.OfficeParser.parse(
> OfficeParser.java:132)
> >  at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:280)
> >  at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:280)
> >  at org.apache.tika.parser.AutoDetectParser.parse(
> AutoDetectParser.java:143)
> >  at org.apache.manifoldcf.agents.transformation.tika.
> TikaParser.parse(TikaParser.java:74)
> >  agents process ran out of memory - shutting down
> >  java.lang.OutOfMemoryError: Java heap space
> >  at java.base/java.util.Arrays.copyOf(Arrays.java:3744)
> >  at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(
> AbstractStringBuilder.java:146)
> >  at java.base/java.lang.AbstractStringBuilder.append(
> AbstractStringBuilder.java:531)
> >  at java.base/java.lang.AbstractStringBuilder.append(
> AbstractStringBuilder.java:550)
> >  at java.base/java.lang.StringBuilder.append(StringBuilder.java:171)
> >  at java.base/java.util.regex.Matcher.appendReplacement(
> Matcher.java:1002)
> >  at java.base/java.util.regex.Matcher.replaceAll(Matcher.java:1181)
> >  at de.l3s.boilerpipe.util.UnicodeTokenizer.tokenize(
> UnicodeTokenizer.java:40)
> >  at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.flushBlock(
> BoilerpipeHTMLContentHandler.java:296)
> >  at de.l3s.boilerpipe.sax.CommonTagActions$3.end(
> CommonTagActions.java:143)
> >  at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.endElement(
> BoilerpipeHTMLContentHandler.java:183)
> >  at org.apache.tika.parser.html.BoilerpipeContentHandler.endElement(
> BoilerpipeContentHandler.java:175)
> >  at org.apache.tika.sax.ContentHandlerDecorator.endElement(
> ContentHandlerDecorator.java:136)
> >  at org.apache.tika.sax.SecureContentHandler.endElement(
> SecureContentHandler.java:256)
> >  at org.apache.tika.sax.ContentHandlerDecorator.endElement(
> ContentHandlerDecorator.java:136)
> >  at org.apache.tika.sax.ContentHandlerDecorator.endElement(
> ContentHandlerDecorator.java:136)
> >  at org.apache.tika.sax.ContentHandlerDecorator.endElement(
> ContentHandlerDecorator.java:136)
> >  at org.apache.tika.sax.SafeContentHandler.endElement(
> SafeContentHandler.java:273)
> >  at org.apache.tika.sax.XHTMLContentHandler.endDocument(
> XHTMLContentHandler.java:224)
> >  at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:109)
> >  at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:280)
> >  at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:280)
> >  at org.apache.tika.parser.AutoDetectParser.parse(
> AutoDetectParser.java:143)
> >  at org.apache.manifoldcf.agents.transformation.tika.
> TikaParser.parse(TikaParser.java:74)
> >  at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.
> addOrReplaceDocumentWithException(TikaExtractor.java:235)
> >  at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$
> PipelineAddEntryPoint.addOrReplaceDocumentWithExcept
> ion(IncrementalIngester.java:3226)
> >  at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$
> PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
> >  at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$
> PipelineObjectWithVersions.addOrReplaceDocumentWithExcept
> ion(IncrementalIngester.java:2708)
> >  at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.
> documentIngest(IncrementalIngester.java:756)
> >  at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.
> ingestDocumentWithException(WorkerThread.java:1583)
> >  at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.
> ingestDocumentWithException(WorkerThread.java:1548)
> >  at org.apache.manifoldcf.crawler.connectors.filesystem.FileConnector.
> processDocuments(FileConnector.java:448)
> >  13:40:33.995 [Worker thread '42'] WARN org.apache.manifoldcf.jobs -
> Service interruption reported for job 1532551209410 connection 'file': IO
> exception: null
> >  [Thread-475] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@5d235104\{HTTP/1.1}{0.0.0.0:8345}
> >  [Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped o.e.j.w.WebAppContext@6105f8a3{/mcf-api-service,[file:/tmp/
> jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-
> any-9896962439762567079.dir/webapp/,UNAVAILABLE|file:///
> tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-
> 9896962439762567079.dir/webapp/,UNAVAILABLE]}\{/opt/
> manifoldcf/manifoldcf_single/././web/war/mcf-api-service.war}
> > [Thread-475] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped o.e.j.w.WebAppContext@12365c88{/mcf-authority-service,[file:
> /tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-
> authority-service-any-3954308360064638561.dir/webapp/,UNAVAILABLE|file:///
> tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-
> authority-service-any-3954308360064638561.dir/webapp/,UNAVAILABLE]}
> > {/opt/manifoldcf/manifoldcf_single/././web/war/mcf-
> authority-service.war}
> >  This occurs when ES Connector has this issue:
> > |07-26-2018 19:34:25.356|Indexation (ES)|file:/var/manifoldcf/
> corpus/000640.html|CLIENTPROTOCOLEXCEPTION|46190|9|
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message