Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 49084 invoked from network); 22 Mar 2007 20:16:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Mar 2007 20:16:56 -0000 Received: (qmail 30927 invoked by uid 500); 22 Mar 2007 20:17:00 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 30889 invoked by uid 500); 22 Mar 2007 20:17:00 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 30877 invoked by uid 99); 22 Mar 2007 20:17:00 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Mar 2007 13:17:00 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Mar 2007 13:16:52 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 2910671403E for ; Thu, 22 Mar 2007 13:16:32 -0700 (PDT) Message-ID: <19098386.1174594592143.JavaMail.jira@brutus> Date: Thu, 22 Mar 2007 13:16:32 -0700 (PDT) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Created: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org If you "flush by RAM usage" then IndexWriter may over-merge ----------------------------------------------------------- Key: LUCENE-845 URL: https://issues.apache.org/jira/browse/LUCENE-845 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.1 Reporter: Michael McCandless Assigned To: Michael McCandless Priority: Minor I think a good way to maximize performance of Lucene's indexing for a given amount of RAM is to flush (writer.flush()) the added documents whenever the RAM usage (writer.ramSizeInBytes()) has crossed the max RAM you can afford. But, this can confuse the merge policy and cause over-merging, unless you set maxBufferedDocs properly. This is because the merge policy looks at the current maxBufferedDocs to figure out which segments are level 0 (first flushed) or level 1 (merged from level 0 segments). I'm not sure how to fix this. Maybe we can look at net size (bytes) of a segment and "infer" level from this? Still we would have to be resilient to the application suddenly increasing the RAM allowed. The good news is to workaround this bug I think you just need to ensure that your maxBufferedDocs is less than mergeFactor * typical-number-of-docs-flushed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org