Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 18997 invoked from network); 8 Aug 2004 09:32:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 8 Aug 2004 09:32:07 -0000 Received: (qmail 98182 invoked by uid 500); 8 Aug 2004 09:32:03 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 98142 invoked by uid 500); 8 Aug 2004 09:32:02 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 98127 invoked by uid 99); 8 Aug 2004 09:32:02 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FORGED_RCVD_HELO X-Spam-Check-By: apache.org Received: from [217.160.91.29] (HELO p15112568.pureserver.info) (217.160.91.29) by apache.org (qpsmtpd/0.27.1) with ESMTP; Sun, 08 Aug 2004 02:31:59 -0700 Received: from intrafind.de (ppp-62-245-160-78.mnet-online.de [62.245.160.78]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by p15112568.pureserver.info (Postfix) with ESMTP id 9BE2E14014A for ; Sun, 8 Aug 2004 11:31:57 +0200 (CEST) Message-ID: <4115F305.8080909@intrafind.de> Date: Sun, 08 Aug 2004 11:31:49 +0200 From: Bernhard Messer User-Agent: Mozilla Thunderbird 0.5 (X11/20040208) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Developers List Subject: Re: possible SegmentMerger optimization References: <4114C745.6000006@intrafind.de> <41153732.5070308@earthlink.net> In-Reply-To: <41153732.5070308@earthlink.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Dmitry, yeap, you're right Dmitry. Switch on/off compound file would be the trick to simulate the same behavior i described. I did some test on that and found that it working perfect. I think we can leave everything as it is, maybe we should document it somewhere. Does there exists something like a "tips and tricks" section on the lucene website ? Bernhard Dmitry Serebrennikov wrote: > Bernhard Messer wrote: > >> hi developers, >> >> may be there is a small, but effective possibility to optimize the >> SegmentMerger class when compound file option is enabled, which is >> default since lucene 1.4. >> >> The current implementation creates and writes the compound index file >> every time the merge() method is called. Due to the fact, that io >> operations are expensive and time consuming, it would be cool to >> write the compound index file just when optimizing the index. The >> change itself wouldn't be a big deal, adding a boolean parameter to >> SegmenMerger.merge(boolean finalize). Only if finalize==true and >> compound option is enabled, the compound file will be created. To >> fullfill the implementation, the same parameter could be added to >> mergeSegments(int minSegment, boolean finalize) within IndexWriter. >> When mergeSegments is called from flushRamSegments() or >> maybeMergeSegments(), finalize is set to false. Only when called from >> optimize(), finalize will be set to true and the compound file will >> be written. >> >> The dark side will be to explain developers, if they are not >> optimizing the index before closing, compound file option has no >> effect. The other thing is, that we might run into the problem of too >> many open files, which sometimes was reported before the compound >> option was introduced. > > > Yea, that was kind of the point of having the compound files - to > avoid too many file handles, especially during indexing. I hear you on > inefficient use of disk IO, though. > >> >> The negative side could be solved when making the optimization >> optionally available thru IndexWriter. So developers using lucene >> could decide themself if they want to use the "single compound write" >> option or not. > > > One could do that today. Just setUseCompoundFiles(false) during > indexing and call setUseCompoundFiles(true) before the final optimize. > Would that do the trick? Dmitry. > >> >> If wanted and you would like to see the patch, leave me a note and >> i'll create it. >> >> best regards >> Bernhard >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org >> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org