Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 91533 invoked from network); 26 May 2005 19:24:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 26 May 2005 19:24:27 -0000 Received: (qmail 33541 invoked by uid 500); 26 May 2005 19:24:22 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 33518 invoked by uid 500); 26 May 2005 19:24:22 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 33505 invoked by uid 99); 26 May 2005 19:24:22 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=RCVD_BY_IP X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from www3.ziplip.com (HELO ziplip.com) (128.242.109.117) by apache.org (qpsmtpd/0.28) with ESMTP; Thu, 26 May 2005 12:24:20 -0700 Received: from 10.1.0.22 (EHLO 10.1.0.22 10.1.0.22 [10.1.0.22] (may be forged)) by 10.1.0.22 with ESMTP id NZADEBNXHQNTKPIANVOBKNAJLVHTKMLQN3N4AFEH for ; 26 May 2005 12:21:49 -0700 (PDT) Message-ID: Date: Thu, 26 May 2005 12:21:49 -0700 (PDT) From: Arvind Srinivasan Reply-To: Arvind Srinivasan To: Subject: Re: Potential Segment corruption Cc: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-ZLPwdHint: X-ZLExpiry: -1 X-ZLReceiptConfirm: N X-Mailer: ZipLip v4.2 X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Thanks for the quick turn around. >I think the fix is much simpler. This is a bug in FSDirectory. >Directory.createOutput() should always create a new empty file, and >FSDirectory's implementation does not ensure this. It should try to >delete the file before opening it and/or call RandomAccessFile.setLength(0). Agreed. >I've attached a patch. Does this fix things for you? The patch on the follow up mail does look good. However, I have additional concerns: (a) deleteFile call may fail. eg. File is left open from the previous exception. This makes me believe the ideal scenario is to not to reuse the segment name once the newSegment call issues one. I strongly recommend this for 2.0. (b) We should add a comment on Directory interface, so that people who implement their own directory do not run into this issue and for that reason, I like RandomAccessFile.setLength(0). However, since the code currently calls createFile from many locations, we could add a comment something like this: --- /** Creates a new, empty file in the directory with the given name. Returns a stream writing this file. Ensure the OutputStream points to 0 byte length file. */ public abstract OutputStream createFile(String name) throws IOException; --- A side note: I had the task to recover one such index. Initially, I thought since the bytes are overwritten, the segment should not be corrupted and can be recovered. However, the reader code relies on the file length (FieldsReader) and so if you do not know the exact length, you cannot recover the index. It seems to me that with a few tweak on the read, the index can be made robust to simple failures. We already have the ability to discard the corrupted segment and allow searches to continue on other segments. I think this tread into the White board type of stuff. I am not sure if I can write to the whiteboard. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org