lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Valentin Popov <valentin...@gmail.com>
Subject Re: 500 millions document for loop.
Date Thu, 12 Nov 2015 18:00:08 GMT
Toke, I just look throw code we already using such method 

IndexSearcher indexSearcher = getIndexSearcher(searchResult);
			
			TopDocs topDocs;
			ScoreDoc currectScoreDoc = p.startScoreDoc;
			for (int page = 1; page < pages - 1; page++) {
    			topDocs = indexSearcher.searchAfter(currectScoreDoc, query, queryFilter, searchResult.getPageSize(),
sort);
    			int endpos = topDocs.scoreDocs.length - 1;
    			if (endpos > 0) {
    				startIdx += topDocs.scoreDocs.length;
    				currectScoreDoc = topDocs.scoreDocs[endpos];
    				searchResult.setPage(currectScoreDoc, startIdx);
    			}
    			
    			topDocs = null;
    			
    			if (searchResult.getCancelled()) {
    				return searchResult;
    			}
    			
			}


> On 12 нояб. 2015 г., at 20:42, Toke Eskildsen <te@statsbiblioteket.dk> wrote:
> 
> Valentin Popov <valentin.po@gmail.com> wrote:
> 
>> We have ~10 indexes for 500M documents, each document
>> has «archive date», and «to» address, one of our task is
>> calculate statistics of «to» for last year. Right now we are
>> using search archive_date:(current_date - 1 year) and paginate
>> results for 50k records for page. Bottleneck of that approach,
>> pagination take too long time and on powerful server it take 
>> ~20 days to execute, and it is very long.
> 
> Lucene does not like deep page requests due to the way the internal Priority Queue works.
Solr has CursorMark, which should be fairly simple to emulate in your Lucene handling code:
> 
> http://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/
> 
> - Toke Eskildsen
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

Regards,
Valentin Popov





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message