lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: 500 millions document for loop.
Date Thu, 12 Nov 2015 17:50:45 GMT
Hi,

The big question is: Do you need the results paged at all? Do you need them sorted? If not,
the easiest approach is to use a custom Collector that does no sorting and just consumes the
results.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Valentin Popov [mailto:valentin.po@gmail.com]
> Sent: Thursday, November 12, 2015 6:48 PM
> To: java-user@lucene.apache.org
> Subject: Re: 500 millions document for loop.
> 
> Toke, thanks!
> 
> We will look at this solution, looks like this is that what we need.
> 
> 
> > On 12 нояб. 2015 г., at 20:42, Toke Eskildsen <te@statsbiblioteket.dk>
> wrote:
> >
> > Valentin Popov <valentin.po@gmail.com> wrote:
> >
> >> We have ~10 indexes for 500M documents, each document
> >> has «archive date», and «to» address, one of our task is
> >> calculate statistics of «to» for last year. Right now we are
> >> using search archive_date:(current_date - 1 year) and paginate
> >> results for 50k records for page. Bottleneck of that approach,
> >> pagination take too long time and on powerful server it take
> >> ~20 days to execute, and it is very long.
> >
> > Lucene does not like deep page requests due to the way the internal
> Priority Queue works. Solr has CursorMark, which should be fairly simple to
> emulate in your Lucene handling code:
> >
> > http://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-
> cursor-based-iteration-of-large-result-sets/
> >
> > - Toke Eskildsen
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> Regards,
> Valentin Popov
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message