lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Stewart <>
Subject Re: search performance
Date Tue, 03 Jun 2014 13:24:59 GMT
With regards to pagination, is there a way for you to cache the
IndexSearcher, Query, and TopDocs between user pagination requests (a
lot of webapp frameworks have object caching mechanisms)? If so, you
may have luck with code like this:

  void ensureTopDocs(final int rank) throws IOException {
    if (StartDocIndex > rank) {
      Docs =, TOP_DOCS_WINDOW);
      StartDocIndex = 0;
    int len = Docs.scoreDocs.length;
    while (StartDocIndex + len <= rank) {
      StartDocIndex += len;
      Docs = Searcher.searchAfter(Docs.scoreDocs[len - 1],
SearchQuery, TOP_DOCS_WINDOW);
      len = Docs.scoreDocs.length;

StartDocIndex is a member variable denoting the current rank of the
first item in TopDocs ("Docs") window. I call this function before
each Document retrieval. The common case--of the user looking at the
first page of results or the user advancing to the next page--is quite
fast. But it still supports random access, albeit not in constant
time. OTOH, if your app is concurrent, most search queries will
probably be returned very quickly so the odd query that wants to jump
deep into the result set will have more of the server's resources
available to it.

Also, given the size of your result sets, you have to allocate a lot
of memory upfront which will then get gc'd after some time. From query
to query, you will have a decent amount of memory churn. This isn't
free. My guess is using Lucene's linear (search() & searchAfter())
pagination will perform faster than your current approach just based
upon not having to create such large arrays.

I'm not the Lucene expert that Robert is, but this has worked alright for me.



On Tue, Jun 3, 2014 at 8:47 AM, Jamie <> wrote:
> Robert. Thanks, I've already done a similar thing. Results on my test
> platform are encouraging..
> On 2014/06/03, 2:41 PM, Robert Muir wrote:
>> Reopening for every search is not a good idea. this will have an
>> extremely high cost (not as high as what you are doing with "paging"
>> but still not good).
>> Instead consider making it near-realtime, by doing this every second
>> or so instead. Look at SearcherManager for code that helps you do
>> this.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Jon Stewart, Principal
(646) 719-0317 | | Arlington, VA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message