Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 88283 invoked from network); 3 May 2007 05:44:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 May 2007 05:44:32 -0000 Received: (qmail 9072 invoked by uid 500); 3 May 2007 05:44:32 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 9030 invoked by uid 500); 3 May 2007 05:44:32 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Delivered-To: moderator for java-dev@lucene.apache.org Received: (qmail 59081 invoked by uid 99); 2 May 2007 19:59:37 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Message-Id: <1178135948.7073.1187773683@webmail.messagingengine.com> X-Sasl-Enc: k4+pHKmO3qrT4WUNQuFgTBsMXFvP6aW/bKEok4045jRq 1178135948 From: "Michael McCandless" To: java-dev@lucene.apache.org Content-Disposition: inline Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="ISO-8859-1" MIME-Version: 1.0 X-Mailer: MessagingEngine.com Webmail Interface References: <24175587.1175343325320.JavaMail.jira@brutus> Subject: Re: [jira] Created: (LUCENE-854) Create merge policy that doesn't periodically inadvertently optimize In-Reply-To: Date: Wed, 02 May 2007 15:59:08 -0400 X-Virus-Checked: Checked by ClamAV on apache.org "Ning Li" wrote: > On 3/31/07, Michael McCandless (JIRA) wrote: > > Create merge policy that doesn't periodically inadvertently optimize > > -------------------------------------------------------------------- > > So we could make a small change to the policy by only merging the > > first mergeFactor segments once we hit 2X the merge factor. With > > mergeFactor=10, when we have created the 20th level 0 (just flushed) > > segment, we merge the first 10 into a level 1 segment. Then on > > creating another 10 level 0 segments, we merge the second set of 10 > > level 0 segments into a level 1 segment, etc. > > Hi Mike, > > When a 20th level 0 segment triggers a 20th level 1 segment which > triggers a 20th level 2 segment... we are still optimizing, aren't we? > Am I missing something here? The merge would "cascade" in this case, but, would not optimize (you will have > 1 segments in the end). Each time you cascade you only merge the first 10 at each level, so after cascading you would have 1 level 3 segment, 10 level 2 segments, 10 level 1 segments and 10 level 0 segments. I'm actually using this merge policy in my patch for LUCENE-843 when merging the flushed "partial" segments. This is only used when IndexWriter is opened with autoCommit=false, and, you add lots and lots of documents (so RAM flushes many times). But, I like the proposed merge policy at the end of LUCENE-845 even better for Lucene's normal merges. It would merge based on size (not # docs), would be free to merge adjacent segments (not just rightmost segments), and would merge N (configurable) at a time. The part that's still unclear is how it chooses when to "trigger" a merge and how specifically it picks which N segments to merge (maybe: the series of N adjacent segments that are "most similar" in size, but favoring smaller segments over larger ones). Mike -- Michael McCandless mail@mikemccandless.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org