lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jake dsouza <jakedsouz...@gmail.com>
Subject Possible Unhandled Exception in org.apache.lucene.benchmark.byTask.feeds.DemoHTMLParser
Date Mon, 16 Apr 2012 17:00:33 GMT
Hi All ,

I am trying to index the Trec GOV2 data set and I am getting a few
Exceptions from this class . Please see the Stack Trace Below

java.lang.NullPointerException
Apr 16, 2012 5:32:55 AM
at
org.apache.lucene.benchmark.byTask.feeds.DemoHTMLParser.parse(DemoHTMLParser.java:55)
Apr 16, 2012 5:32:55 AM
at
org.apache.lucene.benchmark.byTask.feeds.TrecGov2Parser.parse(TrecGov2Parser.java:56)
Apr 16, 2012 5:32:55 AM
at
org.apache.lucene.benchmark.byTask.feeds.TrecParserByPath.parse(TrecParserByPath.java:30)
Apr 16, 2012 5:32:55 AM
at
org.apache.lucene.benchmark.byTask.feeds.TrecContentSource.getNextDocData(TrecContentSource.java:292)
Apr 16, 2012 5:32:55 AM
at com.Gov2Reader.indexDocs(Gov2Reader.java:117)

>From what I noticed , in line 56 of DemoHTMLParser we have   date =
dateFormat.parse(props.getProperty("date").trim()); but in this case ,
dateFormat = null , due to which the exception was thrown . The parse
method in TrecGov2Parser passes null to the DemoHTMLParser.parse method .

Due to this exception , some documents are missed from being indexed .

Regards
Jake

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message