lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tri Cao <tm...@me.com>
Subject Re: search performance
Date Mon, 02 Jun 2014 19:45:28 GMT
This is an interesting performance problem and I think there is probably not
a single answer here, so I'll just layout the steps I would take to tackle this:

1. What is the variance of the query latency? You said the average is 5 minutes,
but is it due to some really bad queries or most queries have the same perf?

2. We kind of assume that index size and number of docs is the issue here.
Can you validate that assumption by trying to index with 10M, 50M, … docs
and see how worse the performance is getting as a function of size?

3. What is the average doc hits for the bad queries? If you queries matches
a lot of hits, scoring will be very expensive. While you only ask for 1000 top
scored docs, Lucene still needs to score all the hits to get that 1000 docs.
If this is the case, there could be some work around, but Iet's make sure
that it's indeed the situation we are dealing with here.

Hope this helps,
Tri

On Jun 01, 2014, at 11:50 PM, Jamie <jamie@mailarchiva.com> wrote:

Greetings

Despite following all the recommended optimizations (as described at 
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) , in some of 
our installations, search performance has reached the point where is it 
unacceptably slow. For instance, in one environment, the total index 
size is 200GB, with 150 million documents indexed. With NRT enabled, 
search speed is roughly 5 minutes on average. The server resources are: 
2x6 Core Intel CPU, 128GB, 2 SSD for index and RAID 0, with Linux.

The only thing we haven't yet done, is to upgrade Lucene from 4.7.x to 
4.8.x. Is this likely to make any noticeable difference in performance?

Clearly, longer term, we need to move to a distributed search model. We 
thought to take advantage of the distributed search features offered in 
Solr, however, our solution is very tightly integrated into Lucene 
directly (since Solr didn't exist when we started out). Moving to Solr 
now seems like a daunting prospect. We've also following the Katta 
project with interest, but it doesn't appear support distributed 
indexing, and development on it seems to have stalled. It would be nice 
if there were a distributed search project on the Lucene level that we 
could use.

I realize this is a rather vague question, but are there any further 
suggestions on ways to improve search performance? We need cheap and 
dirty ideas, as well as longer term advice on a possible path forward.

Much appreciate

Jamie

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message