lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Valentin Popov <valentin...@gmail.com>
Subject Re: 500 millions document for loop.
Date Thu, 12 Nov 2015 17:54:16 GMT
Hi,

> 
> Hi,
> 
> The big question is: Do you need the results paged at all?

Yup, because if we return all results, we get OME. 

> Do you need them sorted?

Nope. 

> If not, the easiest approach is to use a custom Collector that does no sorting and just
consumes the results.

Main bottleneck as I see come from next page search, that took ~2-4 seconds. 

> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
>> -----Original Message-----
>> From: Valentin Popov [mailto:valentin.po@gmail.com]
>> Sent: Thursday, November 12, 2015 6:48 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: 500 millions document for loop.
>> 
>> Toke, thanks!
>> 
>> We will look at this solution, looks like this is that what we need.
>> 
>> 
>>> On 12 нояб. 2015 г., at 20:42, Toke Eskildsen <te@statsbiblioteket.dk>
>> wrote:
>>> 
>>> Valentin Popov <valentin.po@gmail.com> wrote:
>>> 
>>>> We have ~10 indexes for 500M documents, each document
>>>> has «archive date», and «to» address, one of our task is
>>>> calculate statistics of «to» for last year. Right now we are
>>>> using search archive_date:(current_date - 1 year) and paginate
>>>> results for 50k records for page. Bottleneck of that approach,
>>>> pagination take too long time and on powerful server it take
>>>> ~20 days to execute, and it is very long.
>>> 
>>> Lucene does not like deep page requests due to the way the internal
>> Priority Queue works. Solr has CursorMark, which should be fairly simple to
>> emulate in your Lucene handling code:
>>> 
>>> http://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-
>> cursor-based-iteration-of-large-result-sets/
>>> 
>>> - Toke Eskildsen
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> 
>> 
>> Regards,
>> Valentin Popov
>> 
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


 С Уважением,
Валентин Попов






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message