lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: what's the best practice for getting "next page" of hits?
Date Thu, 19 Feb 2009 14:18:43 GMT
The best practice is, well, "It Depends" (tm). First off, I wouldn't do any
caching of results unless and until you had a reasonable certainty that
you had performance issues, so <b> would by my first choice. And if
you *did* start to see performance issues, I'd look first at why the queries
were expensive rather than look at caching. And I'd be certain that
you were getting a lot of requests for pages 2-N by mining my
query logs. There's no point in putting a caching scheme in if only
10% of your queries were for subsequent pages. Or even 50% of the
queries were for subsequent pages.

The thing to remember is that every search/sort *must* score and/or sort
all the documents to catch the case that the very last
document in the index is the best match. So having a method that
only returned matches N through N+pagesize only saves the
time/memory needed to copy matches 0 through N, and each
ScoreDoc is just an int and a float. You can copy a LOT of
ScoreDocs around before you notice........

What a caching scheme *would* save is re-executing the query. But long
before I went to a caching scheme, I'd try to understand why my queries
were slow. Especially when you couple that with the fact that the
overwhelming
number of users don't page very far into the result set before changing the
query.

Form the eXtreme Programming people "Do the simplest thing
that could possibly work". I add the addendum "Then *measure* to see
what the problems are before 'fixing' anything".

FWIW
Erick


On Wed, Feb 18, 2009 at 10:29 PM, <rolarenfan@earthlink.net> wrote:

> R2.4
>
> So, I may well be missing something here, but: I use
>
> <pseudoCode>IndexSearcher.search(someQuery, null, count, new
> Sort());</pseudoCode>
>
> to get an instance of TopFieldDocs (the "Hits" is deprecated). So far, all
> fine; I get a bunch of documents. Now, what is the Lucene-best-practice for
> getting the *next* batch of size "count"? (Didn't see this discussed
> anywhere, but maybe I missed it.)
>
> a) I could guess that my users will never want more than "N*count", for
> some value of N, request that right up front, and do all my own "paging"
> using the one TopFieldDocs instance;
>
> b) I could assume that (a) will be an inefficient memory and time hog, and
> when the user clicks "Next" (or whatever), then ... (with i starting at "1")
> get a new TopFieldDocs with "(++i)*count", and out of that discard the first
> "i*count" items? In the limit (as i => N) that uses up just as much space
> and memory, but does so lazily (better);
>
> c) some compromise of (a) and (b), where I get M*count, do my own paging,
> and when the user asks for the (i+1)==(M+1)-th batch, then get another
> M*count (maybe faster, but also maybe bigger amortized memory footprint);
>
> d) something else? (I'd hope for something like a search() method with some
> parameter saying, in effect, "such and such a range of hits" ...)
>
> thanks,
> Paul
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message