lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arvind Srinivasan <>
Subject Re: Potential Segment corruption
Date Thu, 26 May 2005 19:21:49 GMT
Thanks for the quick turn around.  

>I think the fix is much simpler.  This is a bug in FSDirectory.
>Directory.createOutput() should always create a new empty file, and
>FSDirectory's implementation does not ensure this.  It should try to
>delete the file before opening it and/or call RandomAccessFile.setLength(0).


>I've attached a patch.  Does this fix things for you?

The patch on the follow up mail does look good. However, I have additional

(a) deleteFile call may fail. eg. File is left open from the previous exception.
This makes me believe the ideal scenario is to not to reuse the segment name
once the newSegment call issues one. I strongly recommend this for 2.0.

(b)  We should add a comment on Directory interface, so that people who implement
their own directory do not run into this issue and for that reason, I like 
RandomAccessFile.setLength(0). However, since the code currently calls createFile 
from many locations, we could add a comment something like this:

 /** Creates a new, empty file in the directory with the given name.
     Returns a stream writing this file. 
     Ensure the OutputStream points to 0 byte length file.
  public abstract OutputStream createFile(String name)
       throws IOException;

A side note: I had the task to recover one such index. Initially, I thought
since the bytes are overwritten, the segment should not be corrupted and
can be recovered.  However, the reader code relies on the file length (FieldsReader)
and so if you do not know the exact length, you cannot recover the index. 
It seems to me that with a few tweak on the read, the index can be made robust to 
simple failures. We already have the ability to discard the corrupted segment and 
allow searches to continue on other segments.  I think this tread into the
 White board type of stuff. I am not sure if I can write to the whiteboard.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message