Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CC82918745 for ; Thu, 12 Nov 2015 17:47:53 +0000 (UTC) Received: (qmail 19352 invoked by uid 500); 12 Nov 2015 17:47:52 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 19302 invoked by uid 500); 12 Nov 2015 17:47:52 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 19284 invoked by uid 99); 12 Nov 2015 17:47:52 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Nov 2015 17:47:52 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 8967BC0ED3 for ; Thu, 12 Nov 2015 17:47:51 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.1 X-Spam-Level: X-Spam-Status: No, score=-0.1 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id T7vLiTxSZJuC for ; Thu, 12 Nov 2015 17:47:43 +0000 (UTC) Received: from mail-lf0-f48.google.com (mail-lf0-f48.google.com [209.85.215.48]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 79FBF43DAB for ; Thu, 12 Nov 2015 17:47:42 +0000 (UTC) Received: by lffz63 with SMTP id z63so38677276lff.0 for ; Thu, 12 Nov 2015 09:47:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=hTZXouBF0NdSBGWpWA7281wH6MLtLVSmcHlFMByiBp0=; b=ADfLY/gRSvQQIV0uocvMEE4JxY9H5/2HoYXjBz6n3tYkZwNdYFrQY1J4lJ0o+ypNo3 LPjd5OmsSAUmj4gm6/4rrm/+4oRIa+gGz12Y/5ygZP1iOU+/N2jncvjMxJk9aRLAL2Te 1vcKFd23YDQ631jdfr0DYSPMcphjxfy+/irSc9i+BCtp11/KnoXRdPv1etIMVlVAbQ/k HxA6g2JHDu58nlMqsT0YgF66utnPzS0o/n/JsnoLj1Lca70Lw57ML/Ox7Gppqk1OFCBj dqQJjSxaU7e1BCH7iCHwsti04afag2Pc3GRXvO4m5yU77igAyw4GtJbgL+QWtyC0Q1pd /XYA== X-Received: by 10.25.153.146 with SMTP id b140mr282801lfe.33.1447350454503; Thu, 12 Nov 2015 09:47:34 -0800 (PST) Received: from [192.168.0.4] (broadband-109-173-30-101.nationalcablenetworks.ru. [109.173.30.101]) by smtp.gmail.com with ESMTPSA id o137sm2465381lfe.31.2015.11.12.09.47.33 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 12 Nov 2015 09:47:34 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.1 \(3096.5\)) Subject: Re: 500 millions document for loop. From: Valentin Popov In-Reply-To: <1447350178757.58991@statsbiblioteket.dk> Date: Thu, 12 Nov 2015 20:47:33 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <0CE3CC53-9FB4-498C-8B5E-3CD4632CDF62@gmail.com> References: <9F9F27BA-8912-423D-8ECC-B878713C606B@gmail.com> <1447350178757.58991@statsbiblioteket.dk> To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.3096.5) Toke, thanks!=20 We will look at this solution, looks like this is that what we need.=20 > On 12 =D0=BD=D0=BE=D1=8F=D0=B1. 2015 =D0=B3., at 20:42, Toke Eskildsen = wrote: >=20 > Valentin Popov wrote: >=20 >> We have ~10 indexes for 500M documents, each document >> has =C2=ABarchive date=C2=BB, and =C2=ABto=C2=BB address, one of our = task is >> calculate statistics of =C2=ABto=C2=BB for last year. Right now we = are >> using search archive_date:(current_date - 1 year) and paginate >> results for 50k records for page. Bottleneck of that approach, >> pagination take too long time and on powerful server it take=20 >> ~20 days to execute, and it is very long. >=20 > Lucene does not like deep page requests due to the way the internal = Priority Queue works. Solr has CursorMark, which should be fairly simple = to emulate in your Lucene handling code: >=20 > = http://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor= -based-iteration-of-large-result-sets/ >=20 > - Toke Eskildsen >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org >=20 Regards, Valentin Popov --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org