lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Augusto Camarotti" <augu...@prpb.mpf.gov.br>
Subject Re: Solr hanging when extracting a some broken .doc files
Date Thu, 19 Dec 2013 18:31:36 GMT
Hey Andrea! thanks for answering, this is the complete stack trace is following below. (the
other is just the same):
I'm going to try that modification of the logging level but i'm really considering to debug
tika and try to correct it myself.
 
 

03:38:23ERRORSolrCoreorg.apache.solr.common.SolrException: org.apache.tika.exception.TikaException:
Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@386f9474
org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@386f9474
 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:225)
 at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
 at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:647)
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@386f9474
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
 at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
 ... 32 more
Caused by: java.lang.IllegalStateException: Told we're for characters 122 -> 978, but actually
covers 855 characters!
 at org.apache.poi.hwpf.model.TextPiece.<init>(TextPiece.java:73)
 at org.apache.poi.hwpf.model.TextPieceTable.<init>(TextPieceTable.java:111)
 at org.apache.poi.hwpf.model.ComplexFileTable.<init>(ComplexFileTable.java:70)
 at org.apache.poi.hwpf.HWPFOldDocument.<init>(HWPFOldDocument.java:72)
 at org.apache.tika.parser.microsoft.WordExtractor.parseWord6(WordExtractor.java:462)
 at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:81)
 at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186)
 at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161)
 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
 ... 35 more


>>> Andrea Gazzarini <a.gazzarini@gmail.com> 17/12/2013 16:43 >>>
Hi Augusto,
I don't believe the mailing list allows attachments. Could you please post
the complete stacktrace? In addition, set the logging level of tika classes
to FINEST in solr console, maybe can be helpful

Best,
Andrea
On 17 Dec 2013 16:30, "Augusto Camarotti" <augusto@prpb.mpf.gov.br> wrote:

>  Hi guys,
>
>    I'm having a problem with solr when trying to index some broken .doc
> files.
>    I have set up a test case using Solr to index all the files the users
> save on the shared directorys of the company that i work for and Solr is
> hanging when trying to index this file in particular(the one i'm attaching
> on this e-mail). There are some others broken .doc files that Solr index by
> the name without a problem, even logging some Tika erros during the
> process, but when it reaches this file in particular, it hangs and i have
> to cancel the upload.
>    I cannot guarantee the directorys will never hold a broken .doc file,
> or a broken file with some other extension, so i guess solr could just
> return a failing message, or something like that.
>    These are the logging messages solr is recording:
>
>
>   03:38:23 ERROR SolrCore org.apache.solr.common.SolrException:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@386f9474 03:38:25 ERROR
> SolrDispatchFilter null:org.apache.solr.common.SolrException:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@386f9474
>
> So, how do I prevent solr from hanging when trying to index broken files?
>
> Regards,
>
> Augusto Camarotti
>

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message