lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <sar...@syr.edu>
Subject RE: Pagination in Lucene
Date Thu, 22 Jan 2009 18:09:59 GMT
Hi lucenegal,

You'll get much quicker/better responses if you use the
java-user@lucene.apache.org list instead of this list, which has a
relatively small audience.

On 01/21/2009 at 8:43 PM, lucenegal wrote:
> Does the Lucene support pagination for search results ? Some of the
> documentation suggests to requery for each page. The results can be 1M
+
> in my case , what is general recommendation in this situation ?

Lucene's (indirect) support for pagination of search results (the Hits
class) has been deprecated as of version 2.4.0 and will be removed in
version 3.0.0:

<http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Hits.h
tml>

Hits provides an iterator over the results, caching a fixed sized window
of 200 hits (roughly - not sure of this number).  When all of the docs
in cache have been iterated over, the search is performed again, and the
cache is populated with the next window of hits from the complete list
of hits, if there are more hits available.  In your case of 1M+ hits,
the query would be re-executed 1M+/200 = 5K+ times!
 
If you look at the top of the javadocs for the Hits class at the link
above, a non-deprecated alternative is given.  Essentially, you must
take control of the results caching/pagination yourself.

See an example of this in Lucene's SearchFiles demo, in the
doPagingSearch() method (at the bottom of the file):

<http://svn.apache.org/viewvc/lucene/java/tags/lucene_2_4_0/src/demo/org
/apache/lucene/demo/SearchFiles.java?view=markup>

Mark Harwood has posted a class called HitPageCollector, which manages
some of the details for you, here:

<http://markmail.org/message/wlmoznq6mpxjkbav>

Steve


Mime
View raw message