lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: how do I paginate Lucene search results deeply
Date Thu, 14 Mar 2013 10:17:40 GMT
You could also use Lucene's "search after" capability.

It's designed for exactly this use-case (deep paging).

See https://issues.apache.org/jira/browse/LUCENE-2215

Mike McCandless

http://blog.mikemccandless.com

On Thu, Mar 14, 2013 at 6:03 AM, Toke Eskildsen <te@statsbiblioteket.dk> wrote:
> On Thu, 2013-03-14 at 04:11 +0100, dizh wrote:
>> each document has a timestamp identify the time which it is indexed, I
>> want search the documents using sort, the sort field is the timestamp,
>
> [...]
>
>> but when you do paging, for example in a web app , the user want to go
>> to the last 49999980-5000000, well, it is slowly...
>
> Yes. The problen is that it performs a sliding window search with a
> window size of page+topX and that does not work well with 5M entries,
> especially not as it used a heap, which work very well for small windows
> but horrible for large windows.
>
>> I have a large number of Log4J logs, and I want to index them and
>> present them using web ui.
>
> I still don't see why you would want to page to 5M, but okay.
>
> Instead of representing the timestamps directly, convert them to unique
> longs when indexing. Guessing that you always have less than 1000 log
> entries/ms, your long would be
>   (timestamp_in_ms << 10) & counter++
> where the counter is set to 0 each time a different timestamp is
> encountered. This also ensures that the order of your log entries is
> preserved. Let's call the modified timestamps for utime.
>
> When you do a paginated search for 20 results, keep track of the last
> utime. When you request the next page, add a NumericRangeFilter going
> from the last utime (non-inclusive) with no upper limit and ask for the
> top-20 results again
>
>
> NB: Please get rid of the garbage that follows each of your posts on
> this mail list. The Confidentiality Notice has negative value here.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message