Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 6697 invoked from network); 28 Nov 2007 17:48:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Nov 2007 17:48:16 -0000 Received: (qmail 69541 invoked by uid 500); 28 Nov 2007 17:48:00 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 69489 invoked by uid 500); 28 Nov 2007 17:48:00 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 69420 invoked by uid 99); 28 Nov 2007 17:47:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Nov 2007 09:47:59 -0800 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Nov 2007 17:48:09 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 596E071424F for ; Wed, 28 Nov 2007 09:47:43 -0800 (PST) Message-ID: <10244770.1196272063363.JavaMail.jira@brutus> Date: Wed, 28 Nov 2007 09:47:43 -0800 (PST) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1044) Behavior on hard power shutdown In-Reply-To: <9290721.1194030470805.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546312 ] Michael McCandless commented on LUCENE-1044: -------------------------------------------- {quote} When autoCommit is true, then we should periodically commit automatically. When autoCommit is false, then nothing should be committed until the IndexWriter is closed. The ambiguous case is flush(). I think the reason for exposing flush() was to permit folks to commit without closing, so I think flush() should commit too, but we could add a separate commit() method that flushes and commits. {quote} I think deprecating flush(), renaming it to commit(), and clarifying the semantics to mean that commit() flushes pending docs/deletes, commits a new segments_N, syncs all files referenced by this commit, and blocks until the sync is complete, would make sense? And, commit() would in fact commit even when autoCommit is false (flush() doesn't commit now when autoCommit=false, which is indeed confusing). {quote} Perhaps the semantics of autoCommit=true should be altered so that it commits less than every flush. Is that what you were proposing? If so, then I think it's a good solution. Prior to 2.2 the commit semantics were poorly defined. Folks were encouraged to close() their IndexWriter to persist changes, and that's about all we said. 2.2's docs say that things are committed at every flush, but there was no sync, so I don't think changing this could break any applications. So I'm +1 for changing autoCommit=true to sync less than every flush, e.g., only after merges. I'd also argue that we should be vague in the documentation about precisely when autoCommit=true commits. If someone needs to know exactly when things are committed then they should be encouraged to explicitly flush(), not to rely on autoCommit. {quote} OK, I will test the "sync only when committing a merge" approach for performance. Hopefully a foreground sync() is fine given that with ConcurrentMergePolicy that's already in a background thread. This would be a nice simplification. And I agree we should be vague about, and users should never rely on, precisely when Lucene has really committed (sync'd) the changes to disk. I'll fix the javadocs. > Behavior on hard power shutdown > ------------------------------- > > Key: LUCENE-1044 > URL: https://issues.apache.org/jira/browse/LUCENE-1044 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Environment: Windows Server 2003, Standard Edition, Sun Hotspot Java 1.5 > Reporter: venkat rangan > Assignee: Michael McCandless > Fix For: 2.3 > > Attachments: FSyncPerfTest.java, LUCENE-1044.patch, LUCENE-1044.take2.patch, LUCENE-1044.take3.patch, LUCENE-1044.take4.patch > > > When indexing a large number of documents, upon a hard power failure (e.g. pull the power cord), the index seems to get corrupted. We start a Java application as an Windows Service, and feed it documents. In some cases (after an index size of 1.7GB, with 30-40 index segment .cfs files) , the following is observed. > The 'segments' file contains only zeros. Its size is 265 bytes - all bytes are zeros. > The 'deleted' file also contains only zeros. Its size is 85 bytes - all bytes are zeros. > Before corruption, the segments file and deleted file appear to be correct. After this corruption, the index is corrupted and lost. > This is a problem observed in Lucene 1.4.3. We are not able to upgrade our customer deployments to 1.9 or later version, but would be happy to back-port a patch, if the patch is small enough and if this problem is already solved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org