Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 71997 invoked from network); 7 Feb 2008 22:00:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Feb 2008 22:00:47 -0000 Received: (qmail 53272 invoked by uid 500); 7 Feb 2008 22:00:35 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 52671 invoked by uid 500); 7 Feb 2008 22:00:33 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 52660 invoked by uid 99); 7 Feb 2008 22:00:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Feb 2008 14:00:33 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.86.89.70] (HELO elasmtp-banded.atl.sa.earthlink.net) (209.86.89.70) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Feb 2008 22:00:02 +0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk20050327; d=ix.netcom.com; b=d3Kg1ciFUUEHTkCgbb+qh0hDMqlI4P/ThjvM4IjxdibL0DLu71WzVtt90GiD/BO5; h=Received:Mime-Version:In-Reply-To:References:Content-Type:Message-Id:Content-Transfer-Encoding:From:Subject:Date:To:X-Mailer:X-ELNK-Trace:X-Originating-IP; Received: from [76.223.30.107] (helo=[192.168.1.64]) by elasmtp-banded.atl.sa.earthlink.net with asmtp (Exim 4.34) id 1JNEmt-0003rF-5S for java-dev@lucene.apache.org; Thu, 07 Feb 2008 17:00:07 -0500 Mime-Version: 1.0 (Apple Message framework v753) In-Reply-To: References: <4821C29E-E2C9-490D-A964-6742F7095991@ix.netcom.com> <26C18E67-F7B6-4671-9975-C35FE55A45C4@gmail.com> <1710CED4-FAF8-43A6-8E5E-D188AF01185A@ix.netcom.com> <4d0b24970802061941vee9ed5ep99b800252429436f@mail.gmail.com> <590861B6-20E7-4436-9536-035A87CA5570@ix.netcom.com> <23066063-4A74-40D5-8CD7-26B0910998D5@ix.netcom.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <1307D6A1-FD49-45E5-B346-091C38CEE1BD@ix.netcom.com> Content-Transfer-Encoding: 7bit From: robert engels Subject: Re: detected corrupted index / performance improvement Date: Thu, 7 Feb 2008 16:00:05 -0600 To: java-dev@lucene.apache.org X-Mailer: Apple Mail (2.753) X-ELNK-Trace: 33cbdd8ed9881ca8776432462e451d7b7f19f0d9c038d9aac262e80b84beac970ffe46018819f9a8350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c X-Originating-IP: 76.223.30.107 X-Virus-Checked: Checked by ClamAV on apache.org My point is that commit needs to be used in most applications, and the commit in Lucene is very slow. You don't have 2x the IO cost, mainly because only the log file needs to be sync'd. The index only has to be sync'd eventually, in order to prune the logfile - this can be done in the background, improving the performance of update and commit cycle. Also, writing the log file is very efficiently because it is an append/sequential operation. Writing the segment files writes multiple files - essentially causing random access writes. I guess I don't see the benefit of 1044 if you can't guarantee the index is at a certain point (you can by calling commit(), but it is VERY slow). I was thinking a better design is to serialize the documents/ operations to disk, and maintain an in memory index of updates/ removes, and then merge those indexes to the main when needed - using a parallel reader on both in the mean-time. On Feb 7, 2008, at 3:06 PM, Michael McCandless wrote: > > robert engels wrote: > >> I might be misunderstanding 1044. There were several approaches, >> and I am not certain what was the final??? > > The final approach (take 7) is to make the index consistent (sync > the files) after finishing a merge. Also, a new method ("commit") > is added which will force a synchronous sync while you wait. Close > also does this. > >> I reread the bug and am still a bit unclear. >> >> If the segments are sync'd as part of the commit, then yes, that >> would suffice. The merges don't need to commit, you just can't >> delete the segments until the merge completes. >> >> I think that building the segments, and syncing each segment - >> since in most cases the caller is going to call commit as part of >> each update, is going to be slower than writing the documents/ >> operations to a log file, but a lot depends on how Lucene is used >> (interactive vs. batch, lots of updates vs. a few). > > Well, and based on how frequently you prune the transaction log > (sync the real files). I think the 2X IO cost is going to make > performance worse with the transaction log. > >> I am not sure how deletions are impacted by all of this. > > Should be fine? The *.del files need to be sync'd just like the > rest of the segments files. > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org