Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 96801 invoked from network); 11 Feb 2007 20:14:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Feb 2007 20:14:56 -0000 Received: (qmail 78602 invoked by uid 500); 11 Feb 2007 20:14:55 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 78567 invoked by uid 500); 11 Feb 2007 20:14:55 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 78556 invoked by uid 99); 11 Feb 2007 20:14:55 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Feb 2007 12:14:55 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of markrmiller@gmail.com designates 66.249.82.237 as permitted sender) Received: from [66.249.82.237] (HELO wx-out-0506.google.com) (66.249.82.237) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Feb 2007 12:14:43 -0800 Received: by wx-out-0506.google.com with SMTP id i29so1404881wxd for ; Sun, 11 Feb 2007 12:14:22 -0800 (PST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; b=CLsHYCoLyAqJLEZFde2Ul44P4QaecQUaNF2VGMOWXAEX2JW25lBeqY5ZdngoNxr9BxqUBoSdvSKg8jMb9/16WgXitOVk5jwaNR8tuNJRWVW49LsJQjZQXNrOehzRj8IDOHiFeblnpvNoqReBXd3Skz5Y897PMX4r5KNNzx3RykM= Received: by 10.70.74.6 with SMTP id w6mr15687695wxa.1171224862152; Sun, 11 Feb 2007 12:14:22 -0800 (PST) Received: from ?192.168.1.103? ( [216.66.114.42]) by mx.google.com with ESMTP id 15sm10172994wrl.2007.02.11.12.14.20; Sun, 11 Feb 2007 12:14:21 -0800 (PST) Message-ID: <45CF791A.6030305@gmail.com> Date: Sun, 11 Feb 2007 15:14:18 -0500 From: Mark Miller User-Agent: Thunderbird 1.5.0.9 (Windows/20061207) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: about merge factor References: <995494.87497.qm@web60425.mail.yahoo.com> <45CF4F07.40701@gmail.com> <36429AB7-55E7-4ABE-AA2B-55F2D73FD40F@apache.org> In-Reply-To: <36429AB7-55E7-4ABE-AA2B-55F2D73FD40F@apache.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Found a mistake in my reponse...when I was talking about max merge docs, I meant max buffered docs. If your going to optimize anyway, the key setting appears to be max buffered docs, and I have yet to see the merge factor affect anything (again, only if you optimize). Oddly, performance seems to decrease as you up max buffered docs far before you are even close to running out of available ram. I do not know why this is, but you should certainly test to see what your prime settings are. Also, the knew benchmarking stuff is awesome. - Mark Grant Ingersoll wrote: > You may find contrib/Benchmark useful in your testing. Doron Cohen > has added a nice framework for scripting benchmarking tests. > > -Grant > > On Feb 11, 2007, at 12:14 PM, Mark Miller wrote: > >> Not sensible at all. First, a merge above something like 90 most >> likely never makes since. Second, I have done some testing and my >> results show that if you optimize the index after loading, the merge >> factor really doesn't matter so keep it at 10 (I never used a max >> merge docs below 50. 100 worked best, 1,000 and 2,000 slowed things >> down even though the test had access to 600MB RAM and the docs where >> around 10-20k each). Setting up a test harness that automatically >> indexes a good amount of docs (I did 20,000) with a variety of >> settings will tell you a lot. Things will obviously bend based on >> your setup. >> >> - Mark >> >> maureen tanuwidjaja wrote: >>> Hi all, >>> I just wondering wheter is it sensible and possible if I have >>> 660,000 documents to be indexed,I set the merge factor to 660,000 >>> instead of the default value 10 (...and this means no merge while >>> indexing) and later after closing the index,I use the IndexWriter >>> to optimize/merge the whole index file... >>> Thanks and Regards, >>> Maureen >>> --------------------------------- >>> We won't tell. Get more on shows you hate to love >>> (and love to hate): Yahoo! TV's Guilty Pleasures list. >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> > > -------------------------- > Grant Ingersoll > Center for Natural Language Processing > http://www.cnlp.org > > Read the Lucene Java FAQ at > http://wiki.apache.org/jakarta-lucene/LuceneFAQ > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org