From dev-return-321078-archive-asf-public=cust-asf.ponee.io@lucene.apache.org  Fri May  4 15:11:34 2018
Return-Path: <dev-return-321078-archive-asf-public=cust-asf.ponee.io@lucene.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id DF9D9180634
	for <archive-asf-public@cust-asf.ponee.io>; Fri,  4 May 2018 15:11:33 +0200 (CEST)
Received: (qmail 2323 invoked by uid 500); 4 May 2018 13:11:32 -0000
Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@lucene.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@lucene.apache.org>
List-Post: <mailto:dev@lucene.apache.org>
List-Id: <dev.lucene.apache.org>
Reply-To: dev@lucene.apache.org
Delivered-To: mailing list dev@lucene.apache.org
Received: (qmail 2298 invoked by uid 99); 4 May 2018 13:11:31 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 May 2018 13:11:31 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 70C9B1807EC;
	Fri,  4 May 2018 13:11:31 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 2.129
X-Spam-Level: **
X-Spam-Status: No, score=2.129 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2,
	RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01,
	RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled
Authentication-Results: spamd3-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=gmail.com
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
	with ESMTP id enSTRqYAu5uh; Fri,  4 May 2018 13:11:29 +0000 (UTC)
Received: from mail-ot0-f173.google.com (mail-ot0-f173.google.com [74.125.82.173])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 40CAF5F1B4;
	Fri,  4 May 2018 13:11:29 +0000 (UTC)
Received: by mail-ot0-f173.google.com with SMTP id h8-v6so24391263otb.2;
        Fri, 04 May 2018 06:11:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:from:date:message-id:subject:to;
        bh=14lOeVkVA39kJo9+9EiQKSxMKIWreeBs7XXv8DvlZ8o=;
        b=IaB4UXHC37ghQmAOT6EEXYLWKvozkwzbecAEghAWHVIBGh0EljsI1yei+vu59G9WEo
         umhr2Lvs/z5Eg2NgAqFc6YQ6maWguz+Axmu2M/ZxRpjEDz5cWXpTHf4uc3vk3WIVH8Dx
         Uew8n8EFU8DTlpxsmr6ogMd3IipQHHwonQSp1ojNqWL/SmXS3NfiXDh3m9UB20d7TNDg
         sUQSMiWwF2y711s4fj5yaW57EoS9hWXysZVmrNWryYFyITiF138YPNYbq9ic6Lnu+TqC
         KCa5EMF/h1EUJQZrKdkV5czZ0URP2yeuTxFx9A1vCDj1iL4qdrxf8m1cLFydi7lNmmP/
         DImA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
        bh=14lOeVkVA39kJo9+9EiQKSxMKIWreeBs7XXv8DvlZ8o=;
        b=hzkhwNJuyOLclKGP95OGZtu/xJ2z4SK56O60meVjNPX6g7mnCsTaPAMPptDV5pdmft
         J36s9XbH78hDQdrf2JEi9DodtqU/0OAGYMUDkQ8FiMWOicQzU0J0bp55PQ5s2FmpB/zi
         XNwH7j3HmmrySfPUrftvd9+lJwQgbUlwG1ReZCzGR1Go0QsQs8HwNipM+EiwJ1mvpXAo
         DX6mp4RwqQfY0toMJ0aykFW5qdwVPNvtPtcWnJ0L2CwuMPMh3bpalJC35ucDM2EvyuRD
         7TaqRzVmkYfAzmch/DYMt29hHwimM0EWFmDZ9mIQj62h/LSaao1hP493fZcEY5T1BPUE
         k1Cw==
X-Gm-Message-State: ALQs6tC4vqyUFcaWTaEbXvG/9UkaQL043af8/KunjO+Zd1w6bLerYREg
	mmhBImcoFjH7omCXw6Rr3igNQDivcIC94WMDt5o=
X-Google-Smtp-Source: AB8JxZq3O8OW+1J96BfZFNYa28CXP0HA2X7WmrSPjffRbuEu+0piI7eYJHY+gFGUHNSSIyVJw7fCH0KTnpxwWdqr1Rw=
X-Received: by 2002:a9d:4117:: with SMTP id o23-v6mr18525783ote.21.1525439488230;
 Fri, 04 May 2018 06:11:28 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.201.120.67 with HTTP; Fri, 4 May 2018 06:11:27 -0700 (PDT)
From: manish gupta <tomanishgupta18@gmail.com>
Date: Fri, 4 May 2018 18:41:27 +0530
Message-ID: <CALEWSYnABTXsRNF0N8M4Zwxs8Wj1Z3syoZUYu58z-Cr8Usbd2Q@mail.gmail.com>
Subject: Query on searchAfter API usage in IndexSearcher
To: dev@lucene.apache.org, general@lucene.apache.org
Content-Type: multipart/alternative; boundary="000000000000d28905056b610e68"

--000000000000d28905056b610e68
Content-Type: text/plain; charset="UTF-8"

Hi Team,

I am new to Lucene and I am trying to use Lucene for text search in my
project to achieve better results in terms of query performance.

Initially I was facing lot of GC issues while using lucene as I was using
search API and passing all the documents count. As my data size is around 4
billion the number of documents created by Lucene were huge. Internally
search API uses TopScoreDocCollector which internally creates a
PriorityQueue of given documents count thus causing lot of GC.

*To avoid this problem I am trying to query using a pagination way wherein
I am query only 10 documents at a time and after that I am using
seacrhAfter API to query further passing the lastScoreDoc from previous
result. This has resolved the GC problem but the query time has increased
by a huge margin from 3 sec to 600 sec.*

*When I debugged I found that even though I use the searchAfter API, it is
not avoiding the IO and every time it is reading the data from disk again.
It is only skipping the results filled in previous search. Is my
understanding correct?. If yes please let me know if there is a better way
to query the results in incremental order so as to avoid GC and with
minimal impact on query performance.*

Regards
Manish Gupta

--000000000000d28905056b610e68
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Team,<div><br></div><div>I am new to Lucene and I am tr=
ying to use Lucene for text search in my project to achieve better results =
in terms of query performance.</div><div><br></div><div>Initially I was fac=
ing lot of GC issues while using lucene as I was using search API and passi=
ng all the documents count. As my data size is around 4 billion the number =
of documents created by Lucene were huge. Internally search API uses TopSco=
reDocCollector which internally creates a PriorityQueue of given documents =
count thus causing lot of GC.</div><div><br></div><div><b>To avoid this pro=
blem I am trying to query using a pagination way wherein I am query only 10=
 documents at a time and after that I am using seacrhAfter API to query fur=
ther passing the lastScoreDoc from previous result. This has resolved the G=
C problem but the query time has increased by a huge margin from 3 sec to 6=
00 sec.</b></div><div><b><br></b></div><div><b>When I debugged I found that=
 even though I use the searchAfter API, it is not avoiding the IO and every=
 time it is reading the data from disk again. It is only skipping the resul=
ts filled in previous search. Is my understanding correct?. If yes please l=
et me know if there is a better way to query the results in incremental ord=
er so as to avoid GC and with minimal impact on query performance.</b></div=
><div><br></div><div>Regards</div><div>Manish Gupta</div></div>

--000000000000d28905056b610e68--