lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <grant.ingers...@gmail.com>
Subject Re: Optimizing search speed & performance for a 10G Index.
Date Fri, 08 Dec 2006 11:44:13 GMT
Have you done any profiling of your application yet to identify  
bottlenecks (i.e. are you sure it is Lucene)?  Without some  
profiling, you really will just be guessing.  Also, search this and  
the dev. list for performance, as there have been many lengthy  
discussions in the past on optimizations that may give you some  
ideas.  Is there any way you can make it so you don't spawn extra  
searches?

Also, how are you handling the newsdate field?  Range Query vs. Range  
Filter.
Do you have any fields in your documents that are large, stored  
fields?  Lazy loading and/or the field selector may help there.   
Search this list for info or the dev list.

How are you creating your queries?  Is there a lot of analysis  
involved?Î

Of course, there always comes a time when you need to look at  
distributing the load, but I am not sure if you are there yet, as I  
seem to recall people being able to handle 10gb w/o too much problem  
on a machine of that size, but I could be wrong.

-Grant

On Dec 8, 2006, at 1:10 AM, Chun Wei Ho wrote:

> Hi,
>
> We run a search engine based on Lucene 1.9.1 / Nutch 0.7.2. Our index
> has approximately 2 million documents and the physical size of it is
> about 10 GB. We run it as a tomcat web application on a Fedora Core 4
> server with duo Xeon 3.2GHz processors and 4GB RAM.
>
> We receive about 46500 web search requests a day (ranging from 50-300
> requests per 5 minutes across the day). Each web search request could
> spawn about one to three actual Lucene searches. Our average response
> time (calculated from the server side - and so excludes network
> latency), is about 2 seconds.
>
> Does this timing of 2 seconds appear plausible for Lucene, based on
> the machine specifications above.
>
>
> Our index is slightly more complex (with multiple fields like title,
> location, site, content). For example, a search for "Linux and Lucene"
> related entries in "Australia" might result in lucene searches for:
>
> ((title:linux^1.0 title:lucene^1.0)^4.0)
> +((
> +(title:linux^5.0 location:linux^1.5 content:linux^1.0)
> +(title:lucene^5.0 location:lucene^1.5 content:lucene^1.0))
> ((+(+content:linux +content:lucene)) +(site:contentsite1
> site:contentsite2 site:contentsite3 site:contentsite4
> site:contentsite5 site:contentsite6 site:contentsite7)))^0.01))
> +location:australia)
> +newsdate:[20061107 TO 20061208]
> +region:au)
> -jobsite:badsite1 -region:badregion1 -jobsite:badsite2
> -jobsite:badsite3 -jobsite:badsite4
>
> Does anyone have ideas or could point us to resources that would allow
> us to improve this performance? 2 seconds response added with network
> latency gives an impression of "slowness" of our site that we are
> trying to reduce.
>
> Thank you.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message