lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3775) Unexpected RuntimeException
Date Fri, 31 Aug 2012 12:55:09 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445886#comment-13445886
] 

Jack Krupansky commented on SOLR-3775:
--------------------------------------

Thanks for reporting the issue. Although it is true that the Solr project can't fix Tika/POI
issues directly, it is very useful for us to be able to report to Solr/SolrCell users that
MS Word 97 may encounter ingestion problems.

Can you confirm whether none of your Word 97 files are being parsed, or is it just some of
them?

This may be this POI bug:
https://issues.apache.org/bugzilla/show_bug.cgi?id=53380

Please comment on that bug directly if you feel it does match and indicate its level of importance
to you. It does not appear to have seen any activity since it was reported back in June.

                
> Unexpected RuntimeException
> ---------------------------
>
>                 Key: SOLR-3775
>                 URL: https://issues.apache.org/jira/browse/SOLR-3775
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.0-BETA
>            Reporter: Alex C
>            Assignee: Uwe Schindler
>
> Hi. I'm using Solr 4.0 Beta (no modifications to default installation) to index, and
it's blowing up on Word *.DOC files:
> {code}curl
> "http://localhost:8983/solr/update/extract?literal.id=doc15&commit=true" -F "myfile=@15.doc"{code}
> Here's the exception. And the same files go through Solr 3.6.1 just fine.
> {noformat}    <?xml version="1.0" encoding="UTF-8"?>
>     <response>
>     <lst name="responseHeader"><int name="status">500</int><int
name="QTime">18</int
>     ></lst><lst name="error"><str
> name="msg">org.apache.tika.exception.TikaException
>     : Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser
>     @328c62ce</str><str name="trace">org.apache.solr.common.SolrException:

> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@328c62ce
>             at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
>     actingDocumentLoader.java:230)
>             at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
>     ntentStreamHandlerBase.java:74)
>             at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
>     erBase.java:129)
>             at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
>     Request(RequestHandlers.java:240)
>             at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
>             at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
>     .java:454)
>             at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
>     r.java:275)
>             at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
>     Handler.java:1337)
>             at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java
>     :484)
>             at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
>     ava:119)
>             at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
>             at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl
>     er.java:233)
>             at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl
>     er.java:1065)
>             at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:
>     413)
>             at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle
>     r.java:192)
>             at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle
>     r.java:999)
>             at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
>     ava:117)
>             at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont
>     extHandlerCollection.java:250)
>             at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl
>     ection.java:149)
>             at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
>     .java:111)
>             at org.eclipse.jetty.server.Server.handle(Server.java:351)
>             at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac
>     tHttpConnection.java:454)
>             at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(Blockin
>     gHttpConnection.java:47)
>             at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(Abstra
>     ctHttpConnection.java:890)
>             at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.header
>     Complete(AbstractHttpConnection.java:944)
>             at
> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:642)
>             at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
>             at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpCo
>     nnection.java:66)
>             at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(So
>     cketConnector.java:254)
>             at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPoo
>     l.java:599)
>             at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool
>     .java:534)
>             at java.lang.Thread.run(Unknown Source)
>     Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException
>     from org.apache.tika.parser.microsoft.OfficeParser@328c62ce
>             at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
>     )
>             at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
>     )
>             at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
>     20)
>             at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
>     actingDocumentLoader.java:224)
>             ... 31 more
>     Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>             at
> org.apache.poi.util.LittleEndian.getInt(LittleEndian.java:163)
>             at
> org.apache.poi.hwpf.model.Colorref.&lt;init&gt;(Colorref.java:81)
>             at
> org.apache.poi.hwpf.model.types.SHDAbstractType.fillFields(SHDAbstrac
>     tType.java:56)
>             at
> org.apache.poi.hwpf.usermodel.ShadingDescriptor.&lt;init&gt;(ShadingD
>     escriptor.java:38)
>             at
> org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.unCompressCHPOpera
>     tion(CharacterSprmUncompressor.java:582)
>             at
> org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.uncompressCHP(Char
>     acterSprmUncompressor.java:65)
>             at
> org.apache.poi.hwpf.model.StyleSheet.createChp(StyleSheet.java:288)
>             at
> org.apache.poi.hwpf.model.StyleSheet.&lt;init&gt;(StyleSheet.java:121
>     )
>             at
> org.apache.poi.hwpf.HWPFDocument.&lt;init&gt;(HWPFDocument.java:346)
>             at
> org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.ja
>     va:77)
>             at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
>     :185)
>             at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
>     :160)
>             at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
>     )
>             ... 34 more
>     </str><int name="code">500</int></lst>
>     </response>{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message