lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Lucene performance bottlenecks
Date Wed, 07 Dec 2005 20:53:46 GMT
Andrzej Bialecki wrote:
> It's nice to have these couple percent... however, it doesn't solve the 
> main problem; I need 50 or more percent increase... :-) and I suspect 
> this can be achieved only by some radical changes in the way Nutch uses 
> Lucene. It seems the default query structure is too complex to get a 
> decent performance.

That would certainly help.

For what it's worth, the Internet Archive has ~10M page Nutch indexes 
that perform adequately.  See:

http://websearch.archive.org/katrina/

The performance is about what you report, but it is quite usable. 
(Please don't stress-test this server!)  We recently built a ~100M page 
Nutch index at the Internet Archive that is surprisingly usable on a 
single CPU.  (This is not yet publicly accessible.)

Perhaps your traffic will be much higher than the Internet Archive's, or 
you have contractual obligations that specify certain average query 
performance, but, if not, ~10M pages is quite searchable using Nutch on 
a single CPU.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message