lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clemens Wyss DEV <clemens...@mysign.ch>
Subject [tika] ForkParser, Lost connection to a forked server process
Date Wed, 18 Feb 2015 07:32:05 GMT
Sorry for cross-posting, but the tika-ml does not seem to be  too "lively":
I am trying to make use of the ForkParser. Unfortunately I am getting „Lost connection to
a forked server process“  for an (encrypted) pdf which I can extract „in-process“. Extracting
the document "in-process" takes approx 40s (!). Also, extracting other (smaller) docs works
in/with the ForkParser. 

Memory should be no problem:
forkParser.setJavaCommand("java -Xmx2048m -Xdebug");

Running the unitTest with the forkparser the test stops after 10seconds. The console output
is alike:
...
SLF4J: Found binding in [tika-in-memory://localhost/3]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
07:28:01.909 [main] INFO  o.apache.pdfbox.pdfparser.PDFParser - Document is encrypted
07:28:02.239 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{706, 0}
07:28:02.239 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{707, 0}
07:28:02.239 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{708, 0} ...
07:28:02.249 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{752, 0}
07:28:02.249 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{753, 0}
07:28:02.249 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{754, 0}
07:28:11.465 [main] ERROR ch.mysign.sky.indexing.IndexUtility - failed to extract text from
input stream
org.apache.tika.exception.TikaException: Failed to communicate with a forked parser process.
The process has most likely crashed due to some error like running out of memory. A new process
will be started for the next parsing request.
	at org.apache.tika.fork.ForkParser.parse(ForkParser.java:142) ~[tika-core.jar:1.7]
	at ch.mysign.sky.indexing.IndexUtility.extractTextFrom(IndexUtility.java:158) [target/:na]
	at ch.mysign.sky.indexing.IndexUtility.extractTextFrom(IndexUtility.java:84) [target/:na]
	at ch.mysign.sky.indexing.IndexUtility.extractTextFrom(IndexUtility.java:70) [target/:na]
	at ch.mysign.sky.indexing.IndexUtilityTest.diesesPdfAuslesenDauertEwig(IndexUtilityTest.java:193)
[target/:na]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_25]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_25]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[na:1.8.0_25] ...
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309) [selenium-server-standalone.jar:na]
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
[.cp/:na]
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) [.cp/:na]
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
[.cp/:na]
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
[.cp/:na]
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
[.cp/:na]
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
[.cp/:na] Caused by: java.io.IOException: Lost connection to a forked server process
	at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:191) ~[tika-core.jar:1.7]
	at org.apache.tika.fork.ForkClient.call(ForkClient.java:125) ~[tika-core.jar:1.7]
	at org.apache.tika.fork.ForkParser.parse(ForkParser.java:134) ~[tika-core.jar:1.7]
	... 38 common frames omitted

Any timeouts I am running in? What else can I investigate on?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message