lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stas Chetvertkov" <schetvert...@oilspace.com>
Subject Lucene index got corrupted
Date Wed, 06 Nov 2002 09:22:56 GMT
Hi All,

We are using lucene for indexing realtime news, and everything was working
fine until now. I have found that one of our lucene indexes is corrupted,
all attempts to search in it / optimize it or merge it to another index
results in 'read past EOF' exception.

My investigation showed that one of segments in index seems invalid. Its
field index file, '_4lvd.fdx', has length equal to 24 bytes, while all field
normalization factor files (_4lvd.f?) are 2 bytes long. Since number of
documents is determined as length('_4lvd.fdx')/8, lucene tries to read 3rd
byte from normalization factor files and fails.

Does anyone have any ideas how this index corruption could occur and how I
can fix it? Any advise would be extremely helpful.

Here is an exception that I get when trying to search in this index:
Exception in thread "main" java.io.IOException: read past EOF
        at org.apache.lucene.store.InputStream.refill(Unknown Source)
        at org.apache.lucene.store.InputStream.readByte(Unknown Source)
        at org.apache.lucene.store.InputStream.readBytes(Unknown Source)
        at org.apache.lucene.index.SegmentReader.norms(Unknown Source)
        at org.apache.lucene.index.SegmentsReader.norms(Unknown Source)
        at org.apache.lucene.search.TermQuery.scorer(Unknown Source)
        at org.apache.lucene.search.Query.scorer(Unknown Source)
        at org.apache.lucene.search.IndexSearcher.search(Unknown Source)
        at org.apache.lucene.search.Hits.getMoreDocs(Unknown Source)
        at org.apache.lucene.search.Hits.<init>(Unknown Source)
        at org.apache.lucene.search.Searcher.search(Unknown Source)
        at org.apache.lucene.search.Searcher.search(Unknown Source)
        at Search.main(Search.java:31)

I am also attaching archived segment that is causing problems.

Regards,
Stas.

Mime
View raw message