lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Optimizing search speed & performance for a 10G Index.
Date Fri, 08 Dec 2006 12:37:32 GMT
Other questions besides Grants...

What is the overall response time? If your overall response time is 10
seconds, tuning Lucene won't help you much.

Are you re-opening a searcher each time? This is bad.

How many documents are you finding and are you iterating over a Hits object
for more than 100 objects? If so think about a HitCollector/TopDocs object.

What else is running on the machines? What kinds of timings are you getting
if you run just the search on a dedicated machine?

What happens if you take away the boosts?

What are you measuring when you measure response time? Does it include the
time you spend processing the hits or just the time to execute the search
call? You might get interesting information if you also measure the time to
execute the searches only, it might give you a hint on where to look next.

And I can't agree with Grant strongly enough when he recommends a profiler
before launching into any major optimization. I've spent far too much time
optimizing what I *knew* was the problem area only to find that it wasn't
<G>. It's certainly worth a few simple experiments to see if you can bracket
the problem, but if the easy stuff doesn't work, get a profiler...........


On 12/8/06, Chun Wei Ho <> wrote:
> Hi,
> We run a search engine based on Lucene 1.9.1 / Nutch 0.7.2. Our index
> has approximately 2 million documents and the physical size of it is
> about 10 GB. We run it as a tomcat web application on a Fedora Core 4
> server with duo Xeon 3.2GHz processors and 4GB RAM.
> We receive about 46500 web search requests a day (ranging from 50-300
> requests per 5 minutes across the day). Each web search request could
> spawn about one to three actual Lucene searches. Our average response
> time (calculated from the server side - and so excludes network
> latency), is about 2 seconds.
> Does this timing of 2 seconds appear plausible for Lucene, based on
> the machine specifications above.
> Our index is slightly more complex (with multiple fields like title,
> location, site, content). For example, a search for "Linux and Lucene"
> related entries in "Australia" might result in lucene searches for:
> ((title:linux^1.0 title:lucene^1.0)^4.0)
> +((
> +(title:linux^5.0 location:linux^1.5 content:linux^1.0)
> +(title:lucene^5.0 location:lucene^1.5 content:lucene^1.0))
> ((+(+content:linux +content:lucene)) +(site:contentsite1
> site:contentsite2 site:contentsite3 site:contentsite4
> site:contentsite5 site:contentsite6 site:contentsite7)))^0.01))
> +location:australia)
> +newsdate:[20061107 TO 20061208]
> +region:au)
> -jobsite:badsite1 -region:badregion1 -jobsite:badsite2
> -jobsite:badsite3 -jobsite:badsite4
> Does anyone have ideas or could point us to resources that would allow
> us to improve this performance? 2 seconds response added with network
> latency gives an impression of "slowness" of our site that we are
> trying to reduce.
> Thank you.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message