lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Best way to purposely corrupt an index?
Date Tue, 19 Apr 2005 21:22:03 GMT
Daniel Herlitz wrote:
> I would suggest you simply do not create unusable indexes. :-)  Handle 
> catch/throw/finally correctly and it should not present any problems.

In some use scenarios it's not that simple... Anyway, back to the 
original question: indexExists() just checks for the presence of the 
"segments" file, so it says nothing about the index consistency. The 
best way to make sure the index is valid is to open it, and catch an 
IOException.

To purposefuly break the index you can do several things:

* delete the "segments" file itself (this will trash the whole index)

* delete one of the segments from the index (should generate exception 
when opening)

* write a bunch of zeros in the middle of a segment file. This should 
result in an exception - but I'm not sure when; whether during open(), 
or during actual reading of affected data. You could then do the 
following: loop through all terms in the index (see IndexReader API), 
and for each term get its TermPositions. This will have to read the 
complete index. Looping through all documents, and reading each 
document, doesn't guarantee that - unstored fields are not loaded into 
documents.

-- 
Best regards,
Andrzej Bialecki
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message