Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B93A518717 for ; Thu, 12 Nov 2015 17:43:16 +0000 (UTC) Received: (qmail 5666 invoked by uid 500); 12 Nov 2015 17:43:15 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 5610 invoked by uid 500); 12 Nov 2015 17:43:15 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 5599 invoked by uid 99); 12 Nov 2015 17:43:15 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Nov 2015 17:43:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id ADFBF1A2364 for ; Thu, 12 Nov 2015 17:43:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.991 X-Spam-Level: X-Spam-Status: No, score=0.991 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id mnyEm_xK_MqZ for ; Thu, 12 Nov 2015 17:43:07 +0000 (UTC) Received: from sbexch04.sb.statsbiblioteket.dk (sbexch04.sb.statsbiblioteket.dk [130.225.24.70]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 0BE5C439E9 for ; Thu, 12 Nov 2015 17:43:06 +0000 (UTC) Received: from sbexch04.sb.statsbiblioteket.dk (130.225.24.70) by sbexch04.sb.statsbiblioteket.dk (130.225.24.70) with Microsoft SMTP Server (TLS) id 15.0.1076.9; Thu, 12 Nov 2015 18:42:59 +0100 Received: from sbexch04.sb.statsbiblioteket.dk ([fe80::84ce:82da:4b03:e7d4]) by sbexch04.sb.statsbiblioteket.dk ([fe80::84ce:82da:4b03:e7d4%14]) with mapi id 15.00.1076.000; Thu, 12 Nov 2015 18:42:59 +0100 From: Toke Eskildsen To: "java-user@lucene.apache.org" Subject: Re: 500 millions document for loop. Thread-Topic: 500 millions document for loop. Thread-Index: AQHRHWi3pPtT6uK3lU6aBfT0xTC//Z6Ypwt/ Date: Thu, 12 Nov 2015 17:42:58 +0000 Message-ID: <1447350178757.58991@statsbiblioteket.dk> References: <9F9F27BA-8912-423D-8ECC-B878713C606B@gmail.com> In-Reply-To: <9F9F27BA-8912-423D-8ECC-B878713C606B@gmail.com> Accept-Language: en-GB, da-DK, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [188.183.66.166] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Valentin Popov wrote:=0A= =0A= > We have ~10 indexes for 500M documents, each document=0A= > has =ABarchive date=BB, and =ABto=BB address, one of our task is=0A= > calculate statistics of =ABto=BB for last year. Right now we are=0A= > using search archive_date:(current_date - 1 year) and paginate=0A= > results for 50k records for page. Bottleneck of that approach,=0A= > pagination take too long time and on powerful server it take =0A= >~20 days to execute, and it is very long.=0A= =0A= Lucene does not like deep page requests due to the way the internal Priorit= y Queue works. Solr has CursorMark, which should be fairly simple to emulat= e in your Lucene handling code:=0A= =0A= http://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-= based-iteration-of-large-result-sets/=0A= =0A= - Toke Eskildsen=0A= --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org