lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raphael Osamede Omoregbee <or...@eecs.qmul.ac.uk>
Subject Re: Indexing with Lucene
Date Thu, 21 Jul 2011 02:59:17 GMT
On 20/07/11 22:32, Simon Willnauer wrote:
> On Wed, Jul 20, 2011 at 3:17 PM, raphael812<oro30@eecs.qmul.ac.uk>  wrote:
>> Hello everyone,
>>
>> I am quite new to lucene and i am using the book lucene in action to learn.
>> I need help in extracting the body content of a html page using tika. The
>> implementation from the book only extracts the html's metadata not the main
>> body content which i need. Is it possible to extract body content from htmls
>> and pdfs and how.
>> Thanks for you help.
> hey,
>   this seems to be a tika / extraction specific question. you should
> try to ask this question on the tika list, I bet you get a quick
> response there!
>
> simon
>> Raphael
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/Indexing-with-Lucene-tp3185409p3185409.html
>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>
Hello all,
  i tried searching through an index i created but it gives me the 
following error in Netbeans 6.9
Exception in thread "main" 
org.apache.lucene.index.CorruptIndexException: Unknown format version: -11
         at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:249)
         at 
org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:73)
         at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:677)
         at 
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
         at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
         at org.apache.lucene.index.IndexReader.open(IndexReader.java:202)
         at 
org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:63)
         at Searcher.search(Searcher.java:66)
         at Searcher.main(Searcher.java:59)

The trouble is i am able to search that same index using the command 
line. does anyone have an idea why this is so. it was working some weeks 
ago on netbeans and now it throws this error.
thanks for the help.

Mime
View raw message