Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 23257 invoked from network); 23 Mar 2007 13:06:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Mar 2007 13:06:01 -0000 Received: (qmail 23131 invoked by uid 500); 23 Mar 2007 13:06:00 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 23090 invoked by uid 500); 23 Mar 2007 13:06:00 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 23079 invoked by uid 99); 23 Mar 2007 13:05:59 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Mar 2007 06:05:59 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of rsk.sen@gmail.com designates 66.249.92.175 as permitted sender) Received: from [66.249.92.175] (HELO ug-out-1314.google.com) (66.249.92.175) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Mar 2007 06:05:51 -0700 Received: by ug-out-1314.google.com with SMTP id k40so1116608ugc for ; Fri, 23 Mar 2007 06:05:30 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=IKnqkJcy5GhAo8xQTPMCCU3SFVvA/056uhQQQ+o/DFxV7d/raBqJjTzJvUmMUTwJ0E5i5oLpUmN61aUBcHrGsIDe4CxcTgbYVZq6gviJ0aByy1tnPtaLV4xWB5E8MNVMBFtxkn0cmR7G9YBfeJhHGadIECk72XFtbicRXPGvCBE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=SmJaOpwBIgZoUArsSf9RT5RjT6IAKU+AeolhtLhDsrKbKxhPAgFD3JEQoD5MNzEgP4TzVCcCvhnrKoNBK25wB230zm8c4zim8IAeSIjEwabsoYBZvrG6zCFUVoK4+corNf6lKBkUJQZ21VjGtIPJdUnhnUntWN9jcL/uhjgA1zE= Received: by 10.67.97.7 with SMTP id z7mr6697048ugl.1174655130219; Fri, 23 Mar 2007 06:05:30 -0700 (PDT) Received: by 10.66.249.10 with HTTP; Fri, 23 Mar 2007 06:05:30 -0700 (PDT) Message-ID: <4628d2a90703230605l6ae89fe4secd81c88bd715c67@mail.gmail.com> Date: Fri, 23 Mar 2007 18:35:30 +0530 From: "SK R" To: java-user@lucene.apache.org Subject: Re: MergeFactor and MaxBufferedDocs value should ...? In-Reply-To: <1174644692.28699.1180984713@webmail.messagingengine.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_21812_2805256.1174655130153" References: <4628d2a90703222351j23880dccr6521586fb127e907@mail.gmail.com> <1174644692.28699.1180984713@webmail.messagingengine.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_21812_2805256.1174655130153 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Please clarify the following. 1.When will be the segments in RAMDirectory moved (flushed) in to FSDirectory? 2.Segments creation by maxBufferedDocs occur in RAMDir. Where merge by MergeFactor happen? whether in RAMDir or FSDir? Thanks in Advance RSK On 3/23/07, Michael McCandless wrote: > > > "SK R" wrote: > > If I set MergeFactor = 100 and MaxBufferedDocs=250 , then first 100 > > segments will be merged in RAMDir when 100 docs arrived. At the end of > > 350th > > doc added to writer , RAMDir have 2 merged segment files + 50 seperate > > segment files not merged together and these are flushed to FSDir. > > > > If wrong, please correct me. > > > > My doubt is whether we should set MergeFactor & MaxBufferedDocs in > > proportional ratio (i.e) MaxBufferedDocs = n*MergeFactor where n = 1,2 > > ... > > to reduce indexing time and get greater performance or no need to worry > > about it's relation? > > Actually, maxBufferedDocs is how many docs are held in RAM before > flushing to a single segment. So with 250, after adding the 250th doc > the writer will write the first segment; after adding the 500th doc, > it writes the second segment, etc. > > Then, mergeFactor says how many segments can be written before a merge > takes place. A mergeFactor of 10 means after writing 10 such > segments from above, they will be merged into a single segment with > 2500 docs. After another 2500 docs you'll have 2 such segments. Then > once you've added your 25000'th doc, all of the 2500 doc segments will > be merged into a single 25000 segment doc, etc. > > To maximize indexing performance you really want maxBufferedDocs to be > as large as you can handle (the bigger you make it, the more RAM is > required by the writer). > > I believe (not certain) larger values of mergeFactor will also improve > performance since it defers merging as long as possible. However, the > larger you make this, the more segments are allowed to exist in your > index, and at some point you will hit file handle limits with your > searchers. > > I don't think these two parameters need to be proportional to one > another. I don't think that will affect performance. > > Another performance boost is to turn off compound file, but, this has > a severe cost of requiring far more file handles during searching. > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_21812_2805256.1174655130153--