lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trevor Boicey <tboi...@brit.ca>
Subject Is my index corrupt?
Date Wed, 28 Aug 2002 21:40:35 GMT
   I have a typical app, running Lucene to index web pages, has been 
working fine for a few months.

   I've noticed that a lot of the lucene native methods are throwing 
exceptions lately, always on the same document it seems. It is like 
there is a document in my index that is internally broken.

   If I call optimize, it throws:

java.lang.ArrayIndexOutOfBoundsException: 110 >= 6

   ...and I suspect doesn't optimize, more later.

   I also have an IndexReader that goes from 0 to reader.maxDoc and 
looks at one of the fields. It throws the same exception when it 
attempts to view document #12367, although it works below and above that 
number.

   (ie: Document MyDocument = reader.document(i); // throws when i=12367)

   It doesn't really affect my code since I can see every other 
document, but I have the feeling that my index can never optimize since 
it keeps failing on that record whenever it looks at it, either to 
optimize or to read it.

   Am I correct in guessing that that document was corrupt?

   Anyways, I tried hard-coding a delete for that document, and it did 
remove it, but now optimize fails with "java.io.IOException: read past EOF".

   I think my index is getting messed up because it should be shrinking 
quickly because my search scope is, but it's getting larger, likely due 
to all the failed optimize attempts.

   Any solution? Any way to stop it happening again?

-- 
Trevor Boicey, P. Eng.
Ottawa, Canada, tboicey@brit.ca
ICQ #17432933 http://www.brit.ca/~tboicey/
"I saw the Dipsy, but WHERE WAS THE DOODLE?" - Phil


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message