Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 93323 invoked from network); 7 Dec 2005 20:54:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 7 Dec 2005 20:54:18 -0000 Received: (qmail 24872 invoked by uid 500); 7 Dec 2005 20:54:12 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 24852 invoked by uid 500); 7 Dec 2005 20:54:11 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 24841 invoked by uid 99); 7 Dec 2005 20:54:11 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Dec 2005 12:54:11 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [209.10.110.95] (HELO londo.swishmail.com) (209.10.110.95) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Dec 2005 12:54:10 -0800 Received: (qmail 27488 invoked by uid 89); 7 Dec 2005 20:53:47 -0000 Received: from unknown (HELO ?192.168.168.81?) (69.228.224.35) by londo.swishmail.com with SMTP; 7 Dec 2005 20:53:47 -0000 Message-ID: <43974BDA.9040203@apache.org> Date: Wed, 07 Dec 2005 12:53:46 -0800 From: Doug Cutting User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc3 (X11/20050929) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Lucene performance bottlenecks References: <3287125F0F0CCE4E8F55435208100CF534EE81@exchange-mbx.be.bvd> <439742A9.9010000@getopt.org> In-Reply-To: <439742A9.9010000@getopt.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Andrzej Bialecki wrote: > It's nice to have these couple percent... however, it doesn't solve the > main problem; I need 50 or more percent increase... :-) and I suspect > this can be achieved only by some radical changes in the way Nutch uses > Lucene. It seems the default query structure is too complex to get a > decent performance. That would certainly help. For what it's worth, the Internet Archive has ~10M page Nutch indexes that perform adequately. See: http://websearch.archive.org/katrina/ The performance is about what you report, but it is quite usable. (Please don't stress-test this server!) We recently built a ~100M page Nutch index at the Internet Archive that is surprisingly usable on a single CPU. (This is not yet publicly accessible.) Perhaps your traffic will be much higher than the Internet Archive's, or you have contractual obligations that specify certain average query performance, but, if not, ~10M pages is quite searchable using Nutch on a single CPU. Doug --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org