lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject Re: 500 millions document for loop.
Date Thu, 12 Nov 2015 17:42:58 GMT
Valentin Popov <valentin.po@gmail.com> wrote:

> We have ~10 indexes for 500M documents, each document
> has «archive date», and «to» address, one of our task is
> calculate statistics of «to» for last year. Right now we are
> using search archive_date:(current_date - 1 year) and paginate
> results for 50k records for page. Bottleneck of that approach,
> pagination take too long time and on powerful server it take 
>~20 days to execute, and it is very long.

Lucene does not like deep page requests due to the way the internal Priority Queue works.
Solr has CursorMark, which should be fairly simple to emulate in your Lucene handling code:

http://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

- Toke Eskildsen

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message