Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 72834 invoked from network); 22 Nov 2010 02:51:05 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Nov 2010 02:51:05 -0000 Received: (qmail 47841 invoked by uid 500); 22 Nov 2010 02:51:36 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 47600 invoked by uid 500); 22 Nov 2010 02:51:36 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 47593 invoked by uid 99); 22 Nov 2010 02:51:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Nov 2010 02:51:35 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Nov 2010 02:51:35 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oAM2pEC7003025 for ; Mon, 22 Nov 2010 02:51:14 GMT Message-ID: <4058874.226831290394274356.JavaMail.jira@thor> Date: Sun, 21 Nov 2010 21:51:14 -0500 (EST) From: "Jason Rutherglen (JIRA)" To: dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934388#action_12934388 ] Jason Rutherglen commented on LUCENE-2680: ------------------------------------------ I've isolated the mismatch in num docs between the CMS vs. SMS generated indexes to applying the deletes to the merging segments (whereas currently we were/are not applying deletes to merging segments and TestStressIndexing2 passes). Assuming the deletes are being applied correctly to the merging segments, perhaps the logic of gathering up forward segment deletes is incorrect somehow in the concurrent merge case. When deletes were held in a map per segment, this test was passing. > Improve how IndexWriter flushes deletes against existing segments > ----------------------------------------------------------------- > > Key: LUCENE-2680 > URL: https://issues.apache.org/jira/browse/LUCENE-2680 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch, LUCENE-2680.patch > > > IndexWriter buffers up all deletes (by Term and Query) and only > applies them if 1) commit or NRT getReader() is called, or 2) a merge > is about to kickoff. > We do this because, for a large index, it's very costly to open a > SegmentReader for every segment in the index. So we defer as long as > we can. We do it just before merge so that the merge can eliminate > the deleted docs. > But, most merges are small, yet in a big index we apply deletes to all > of the segments, which is really very wasteful. > Instead, we should only apply the buffered deletes to the segments > that are about to be merged, and keep the buffer around for the > remaining segments. > I think it's not so hard to do; we'd have to have generations of > pending deletions, because the newly merged segment doesn't need the > same buffered deletions applied again. So every time a merge kicks > off, we pinch off the current set of buffered deletions, open a new > set (the next generation), and record which segment was created as of > which generation. > This should be a very sizable gain for large indices that mix > deletes, though, less so in flex since opening the terms index is much > faster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org