From java-user-return-63715-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Mon May 7 14:44:25 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 36F0E180648 for ; Mon, 7 May 2018 14:44:25 +0200 (CEST) Received: (qmail 34484 invoked by uid 500); 7 May 2018 12:44:18 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 34461 invoked by uid 99); 7 May 2018 12:44:18 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 May 2018 12:44:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id AB6CBC0147 for ; Mon, 7 May 2018 12:44:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.149 X-Spam-Level: ** X-Spam-Status: No, score=2.149 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 8oZRTgdycAxv for ; Mon, 7 May 2018 12:44:15 +0000 (UTC) Received: from mail-ot0-f175.google.com (mail-ot0-f175.google.com [74.125.82.175]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 77F0F5F3BC for ; Mon, 7 May 2018 12:44:14 +0000 (UTC) Received: by mail-ot0-f175.google.com with SMTP id j27-v6so31776705ota.5 for ; Mon, 07 May 2018 05:44:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=y+T1YnGMuXR0639OBMJ8AdOLYxSv50vg2Y/E513ZU6A=; b=Jt2mscFpIvxm2SpaDs5sOU0it/fMtdHFa2wnO/xW0fdFpwTT5uVcGTl719g//jgBMz E3JQaYVJag7qubi/2YQhMyjzw4xvY1QtuyzD/qSjw/OB5ulcw/kENC5er0FBM8VYbpe/ BF1+AoeaqKAO0SgNn3i1L/6LomidIt7xmk0Vp2bU57DAYU1WaYiWx8xMK60gqiC8ZpmD J6JnLc1Af/YWoMKFWpsCqihiKG8MSGZkcalfYcHqtuah8WQFaEO0brv4/LMFoNTmVG5Q A3kM/mNearQ0bMeACF6fWPeG+S+PxA23h8LituLNLu0MTwg/SR5UNw4Ne3qX0cOXx7pJ 33zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=y+T1YnGMuXR0639OBMJ8AdOLYxSv50vg2Y/E513ZU6A=; b=IGIgE3SMDX6UA1cLvVlyfRGU7+FZUgrRmqAatERR4lXogITIAq3bM2ey/J79C9saJx 7vi1+7ETU5t5QqKwrN0fYHwHi1JXbCvuuMN9qQTC5lZ9oYCtWnAQQiTkjl5kSmBIftcx LYFUcOzYrzW2kw6aDWVO3JhWAVyq6di2b5WMoXTdHdivQhGG9PagzGcsTnu2YkbVBqe2 sy4+YyG7DZ6BJjuZN1OhPKpaQxMdYcE6lbRvkZGY/Sgh6RS5bXW2tGiI4BDggBbqZkhx TgZyY24sp0RIWZR0iZzoV9pwf13sc99nvaX6YSU2lNqEPJjjRc1o/e1xlP4bMLm75Huy t2kw== X-Gm-Message-State: ALQs6tCz2L13tD0m7zIkuNORPGmOFd9YoTUIgrUZVGnmDd9g4Uqp0E9d rapD83HZmrm+Dy82Fcc/5DhggWz2zWiS7hG4Wu4w7Q== X-Google-Smtp-Source: AB8JxZo3DggBS3GpAvmMUYCAaCbd+7/5t3IpdYoPokTnSX3NLj6wJWxsxgmJV+iayxVCdM54LmucKC0SMlOPZlbSA3w= X-Received: by 2002:a9d:14e:: with SMTP id 72-v6mr25120458otu.365.1525697053030; Mon, 07 May 2018 05:44:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.201.120.67 with HTTP; Mon, 7 May 2018 05:44:12 -0700 (PDT) From: manish gupta Date: Mon, 7 May 2018 18:14:12 +0530 Message-ID: Subject: Query on searchAfter API usage in IndexSearcher To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary="000000000000e17c5b056b9d06e0" --000000000000e17c5b056b9d06e0 Content-Type: text/plain; charset="UTF-8" Hi Team, I am new to Lucene and I am trying to use Lucene for text search in my project to achieve better results in terms of query performance. Initially I was facing lot of GC issues while using lucene as I was using search API and passing all the documents count. As my data size is around 4 billion the number of documents created by Lucene were huge. Internally search API uses TopScoreDocCollector which internally creates a PriorityQueue of given documents count thus causing lot of GC. *To avoid this problem I am trying to query using a pagination way wherein I am query only 10 documents at a time and after that I am using seacrhAfter API to query further passing the lastScoreDoc from previous result. This has resolved the GC problem but the query time has increased by a huge margin from 3 sec to 600 sec.* *When I debugged I found that even though I use the searchAfter API, it is not avoiding the IO and every time it is reading the data from disk again. It is only skipping the results filled in previous search. Is my understanding correct?. If yes please let me know if there is a better way to query the results in incremental order so as to avoid GC and with minimal impact on query performance.* Regards Manish Gupta --000000000000e17c5b056b9d06e0--