Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 57350 invoked from network); 1 Oct 2010 22:23:59 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Oct 2010 22:23:59 -0000 Received: (qmail 92750 invoked by uid 500); 1 Oct 2010 22:23:58 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 92629 invoked by uid 500); 1 Oct 2010 22:23:57 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 92622 invoked by uid 99); 1 Oct 2010 22:23:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Oct 2010 22:23:57 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Oct 2010 22:23:55 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o91MNXlM029755 for ; Fri, 1 Oct 2010 22:23:34 GMT Message-ID: <29881415.505631285971813769.JavaMail.jira@thor> Date: Fri, 1 Oct 2010 18:23:33 -0400 (EDT) From: "Michael McCandless (JIRA)" To: dev@lucene.apache.org Subject: [jira] Created: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org Improve how IndexWriter flushes deletes against existing segments ----------------------------------------------------------------- Key: LUCENE-2680 URL: https://issues.apache.org/jira/browse/LUCENE-2680 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Fix For: 4.0 IndexWriter buffers up all deletes (by Term and Query) and only applies them if 1) commit or NRT getReader() is called, or 2) a merge is about to kickoff. We do this because, for a large index, it's very costly to open a SegmentReader for every segment in the index. So we defer as long as we can. We do it just before merge so that the merge can eliminate the deleted docs. But, most merges are small, yet in a big index we apply deletes to all of the segments, which is really very wasteful. Instead, we should only apply the buffered deletes to the segments that are about to be merged, and keep the buffer around for the remaining segments. I think it's not so hard to do; we'd have to have generations of pending deletions, because the newly merged segment doesn't need the same buffered deletions applied again. So every time a merge kicks off, we pinch off the current set of buffered deletions, open a new set (the next generation), and record which segment was created as of which generation. This should be a very sizable gain for large indices that mix deletes, though, less so in flex since opening the terms index is much faster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org