lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie <>
Subject Re: search performance
Date Tue, 03 Jun 2014 10:42:54 GMT
Vitality / Robert

I wouldn't go so far as to call our pagination naive!? Sub-optimal, yes. 
Unless I am mistaken, the Lucene library's pagination mechanism, makes 
the assumption that you will cache the scoredocs for the entire result 
set. This is not practical  when you have a result set that exceeds 60M. 
As stated earlier, in any case, it is the first query that is slow.

We do open index readers.. since we are using NRT search. Since 
documents are being added to the indexes on a continuous basis. When the 
user clicks on the Search button, the user will expect to see the latest 
result set. With regards to NRT search, my understanding is that we do 
need to open the index readers on each search operation to see the 
latest changes.

Thus, on each search, we combine the indexreaders into a multireader, 
and open each reader based their corresponding writer.

protected IndexReader initIndexReader() {
     List<IndexReader> readers = new LinkedList<>();
     for (Writer writer : writers) {
         readers.add(, true);
     return MultiReader(readers,true);

Thank you for your ideas/suggestions.


On 2014/06/03, 12:29 PM, Vitaly Funstein wrote:
> Jamie,
> What if you were to forget for a moment the whole pagination idea, and
> always capped your search at 1000 results for testing purposes only? This
> is just to try and pinpoint the bottleneck here; if, regardless of the
> query parameters, the search latency stays roughly the same and well below
> 5 min, you now have the answer - the problem is your naive implementation
> of pagination which results in snowballing result numbers and search times,
> the closer you get to the end of the results range. Otherwise, I would
> focus on your query and filter next.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message