Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 61470 invoked from network); 7 Feb 2008 15:21:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Feb 2008 15:21:43 -0000 Received: (qmail 60867 invoked by uid 500); 7 Feb 2008 15:21:34 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 60473 invoked by uid 500); 7 Feb 2008 15:21:33 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 60462 invoked by uid 99); 7 Feb 2008 15:21:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Feb 2008 07:21:33 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.86.89.62] (HELO elasmtp-dupuy.atl.sa.earthlink.net) (209.86.89.62) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Feb 2008 15:21:16 +0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk20050327; d=ix.netcom.com; b=ndUh6a7iPsOVUFS5YLFJEBKQE3T4HlfD69YyVYfxDUDKz7zw31hc2sc0c9YEFiBj; h=Received:Mime-Version:In-Reply-To:References:Content-Type:Message-Id:Content-Transfer-Encoding:From:Subject:Date:To:X-Mailer:X-ELNK-Trace:X-Originating-IP; Received: from [76.223.30.107] (helo=[192.168.1.64]) by elasmtp-dupuy.atl.sa.earthlink.net with asmtp (Exim 4.34) id 1JN8Yl-0007E2-8O for java-dev@lucene.apache.org; Thu, 07 Feb 2008 10:21:07 -0500 Mime-Version: 1.0 (Apple Message framework v753) In-Reply-To: References: <4821C29E-E2C9-490D-A964-6742F7095991@ix.netcom.com> <26C18E67-F7B6-4671-9975-C35FE55A45C4@gmail.com> <1710CED4-FAF8-43A6-8E5E-D188AF01185A@ix.netcom.com> <4d0b24970802061941vee9ed5ep99b800252429436f@mail.gmail.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <590861B6-20E7-4436-9536-035A87CA5570@ix.netcom.com> Content-Transfer-Encoding: 7bit From: robert engels Subject: Re: detected corrupted index / performance improvement Date: Thu, 7 Feb 2008 09:21:05 -0600 To: java-dev@lucene.apache.org X-Mailer: Apple Mail (2.753) X-ELNK-Trace: 33cbdd8ed9881ca8776432462e451d7b7f19f0d9c038d9aa2b25686c5e03718de969f5760fc635b9350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c X-Originating-IP: 76.223.30.107 X-Virus-Checked: Checked by ClamAV on apache.org This is simply not true. Two different issues are at play. You cannot have a true 'commit' unless it is synchronous! Lucene-1044 might allow the index to be brought back to a consistent state, but not one that is consistent with a synchronization point. For example, I write three documents to the index. I call commit. It returns. After this, those documents MUST be in the index under any conditions. Lucene 1044 does not ensure this. By writing the operations (deletes and updates) to a log file first, and syncing the log file, then a failure during the index writing/ merging can be fixed by rolling forward the log. On Feb 7, 2008, at 4:29 AM, Michael McCandless wrote: > > In fact this is exactly the approach in the final patch on > LUCENE-1044 and it gives far better performance than the simply > synchronous (original) approach of syncing every segment file on > close. > > Using a transaction log would also require periodic syncing. > > LUCENE-1044 syncs files after every merge, in the background thread > of ConcurrentMergeScheduler, which is nice because it does not > block further add/update/deleteDocument calls on the writer. > > Mike > > Andrew Zhang wrote: > >> On Feb 7, 2008 7:22 AM, robert engels wrote: >> >>> That doesn't help, with lazy writing/buffering by the OS, there >>> is no >>> guarantee that if the last written block is ok, that earlier blocks >>> in the file are.... >>> >>> The OS/drive is going to physically write them in the most efficient >>> manner. Only after a sync would this hold true (which is what we are >>> trying to avoid). >> >> >> Hi, how about asynchronous commit? i.e. use a thread to sync the >> data. >> >> We only need to make sure that all data are written to the storage >> before >> the next operation? >> >>> >>> >>> On Feb 6, 2008, at 5:15 PM, DM Smith wrote: >>> >>>> >>>> On Feb 6, 2008, at 5:42 PM, Michael McCandless wrote: >>>> >>>>> >>>>> robert engels wrote: >>>>> >>>>>> Do we have any way of determining if a segment is definitely OK/ >>>>>> VALID ? >>>>> >>>>> The only way I know is the CheckIndex tool, and it's rather >>>>> slow (and >>>>> it's not clear that it always catches all corruption). >>>> >>>> Just a thought. It seems that the discussion has revolved around >>>> whether a crash or similar event has left the file in an >>>> inconsistent state. Without looking into how it is actually done, >>>> I'm going to guess that the writing is done from the start of the >>>> file to its end. That is, no "out of order" writing. >>>> >>>> If this is the case, how about adding a marker to the end of the >>>> file of a known size and pattern. If it is present then it is >>>> presumed that there were no errors in getting to that point. >>>> >>>> Even with out of order writing, one could write an 'INVALID' marker >>>> at the beginning of the operation and then upon reaching the end of >>>> the writing, replace it with the valid marker. >>>> >>>> If neither marker is found then the index is one from before the >>>> capability was added and nothing can be said about the validity. >>>> >>>> -- DM >>>> >>>> ------------------------------------------------------------------- >>>> -- >>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-dev-help@lucene.apache.org >>>> >>> >>> >>> -------------------------------------------------------------------- >>> - >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-dev-help@lucene.apache.org >>> >>> >> >> >> -- >> Best regards, >> Andrew Zhang >> >> db4o - database for Android: www.db4o.com >> http://zhanghuangzhu.blogspot.com/ > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org