tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ross Keatinge (JIRA)" <j...@apache.org>
Subject [jira] Created: (TIKA-397) Parser crashes on very simple file
Date Thu, 01 Apr 2010 11:52:31 GMT
Parser crashes on very simple file
----------------------------------

                 Key: TIKA-397
                 URL: https://issues.apache.org/jira/browse/TIKA-397
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.4
         Environment: Solr 1.4 on Ubuntu 9.10.  OpenJDK Runtime Environment (IcedTea6 1.6.1)
(6b16-1.6.1-3ubuntu1)
            Reporter: Ross Keatinge


Sorry but I can only talk about this from a Solr user's point of view. I'm using Solr's ExtractingRequestHandler
(Solr Cell) to index some text files. In general it's working fine but Tika crashes when parsing
a text file with with certain upper case short words near the start of the file. I haven't
been able to discover the pattern of what works and what doesn't but here's a real simple
example.

A file with just the letters XF and nothing else crashes. If I edit the file and change it
to any of XA, XB, XC, XD or XF it works but XE always crashes it. Lower case works.

I discovered this with certain five letter words that unfortunately are very common in my
documents.

Here's the error message from Solr.

<html><head><title>Apache Tomcat/6.0.20 - Error report</title><style><!--H1
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;}
H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;}
P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color
: black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP
Status 500 - org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027

org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027
	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
	at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
	at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
	... 18 more
Caused by: java.lang.NullPointerException
	at java.io.Reader.&lt;init&gt;(Reader.java:78)
	at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java:93)
	at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java:108)
	at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
	... 20 more
</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b>
<u>org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027

org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027
	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
	at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
	at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
	... 18 more
Caused by: java.lang.NullPointerException
	at java.io.Reader.&lt;init&gt;(Reader.java:78)
	at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java:93)
	at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java:108)
	at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
	... 20 more
</u></p><p><b>description</b> <u>The server encountered
an internal error (org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.txt.TXTParser@a51027

org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027
	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
	at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
	at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
	... 18 more
Caused by: java.lang.NullPointerException
	at java.io.Reader.&lt;init&gt;(Reader.java:78)
	at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java:93)
	at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java:108)
	at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
	... 20 more


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message