Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 95810 invoked from network); 12 Nov 2009 13:15:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Nov 2009 13:15:05 -0000 Received: (qmail 30190 invoked by uid 500); 12 Nov 2009 13:15:04 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 30103 invoked by uid 500); 12 Nov 2009 13:15:03 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 29964 invoked by uid 99); 12 Nov 2009 13:15:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Nov 2009 13:15:03 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Nov 2009 13:15:00 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 9944F234C1EF for ; Thu, 12 Nov 2009 05:14:39 -0800 (PST) Message-ID: <1691130932.1258031679613.JavaMail.jira@brutus> Date: Thu, 12 Nov 2009 13:14:39 +0000 (UTC) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl In-Reply-To: <1260489736.1232557799649.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776985#action_12776985 ] Michael McCandless commented on LUCENE-1526: -------------------------------------------- bq. 2) cpu and memory starvation - monitoring cpu and memory usage, the machine seems very starved, and I think that leads to performance differences more than the extra array look. CPU starvation is fully expected (this is a redline test). Memory starvation is interesting, because the bit vectors should all be transient, and should "die young" from the GC's standpoint. Plus these are all 1/8th the number of docs in RAM usage, and it's only those segments that have deletions whose bit vector is cloned. Are you starting from an optimized index? Oh, here's one idea: how many searches does your test allow to be in-flight at once? (Or: how large a thread pool are you using on the server?). Since you effectively reopen per search, each search will have dup'd the deleted docs. If you allow many searches in flight, that could explain it. > For near real-time search, use paged copy-on-write BitVector impl > ----------------------------------------------------------------- > > Key: LUCENE-1526 > URL: https://issues.apache.org/jira/browse/LUCENE-1526 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.4 > Reporter: Jason Rutherglen > Priority: Minor > Attachments: LUCENE-1526.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > SegmentReader currently uses a BitVector to represent deleted docs. > When performing rapid clone (see LUCENE-1314) and delete operations, > performing a copy on write of the BitVector can become costly because > the entire underlying byte array must be created and copied. A way to > make this clone delete process faster is to implement tombstones, a > term coined by Marvin Humphrey. Tombstones represent new deletions > plus the incremental deletions from previously reopened readers in > the current reader. > The proposed implementation of tombstones is to accumulate deletions > into an int array represented as a DocIdSet. With LUCENE-1476, > SegmentTermDocs iterates over deleted docs using a DocIdSet rather > than accessing the BitVector by calling get. This allows a BitVector > and a set of tombstones to by ANDed together as the current reader's > delete docs. > A tombstone merge policy needs to be defined to determine when to > merge tombstone DocIdSets into a new deleted docs BitVector as too > many tombstones would eventually be detrimental to performance. A > probable implementation will merge tombstones based on the number of > tombstones and the total number of documents in the tombstones. The > merge policy may be set in the clone/reopen methods or on the > IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org