Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DF7D517387 for ; Mon, 19 Jan 2015 14:18:19 +0000 (UTC) Received: (qmail 55912 invoked by uid 500); 19 Jan 2015 14:18:17 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 55847 invoked by uid 500); 19 Jan 2015 14:18:17 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 55824 invoked by uid 99); 19 Jan 2015 14:18:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Jan 2015 14:18:16 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nyadav.ait@gmail.com designates 209.85.214.178 as permitted sender) Received: from [209.85.214.178] (HELO mail-ob0-f178.google.com) (209.85.214.178) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Jan 2015 14:18:12 +0000 Received: by mail-ob0-f178.google.com with SMTP id gq1so29062630obb.9 for ; Mon, 19 Jan 2015 06:17:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=cKDdBaLSyiy5c73ZDkmNXXuMp0fOW/fCpbzeQo9MzQs=; b=EAF7BSs/597TIrnJ24qI2va5/uzaRSgLDXRGD8Ovot085VOso7Jp+pzOkrzcwvTZ51 jI0YrBfm8fP7AwjMYZZu5W5K4HzHF9z3K6BujYUyrZLMR2hxXd1h4Ex5fmGYzlq6fz2p 3Xr87WLgReXq/qOb5T9niZqu7r4ypJTlm8vIvBl/gougN3Z4e1ebHX8QIvKjqNd3XTrA wq0Tx/D/GcnDVsDM0rokKCticg+ZaNzfzZvUNcOdxeP/Hoy+KF3WqGnbbXuDTxHwMR6T E/Qcw7mk/+bsAOfTJl6BueHg1VUoqZ0RqlcZnjLZnq7A8JhFKgEqElPPGMZVjvrf52MO GY0w== X-Received: by 10.202.102.159 with SMTP id m31mr17621592oik.127.1421677071419; Mon, 19 Jan 2015 06:17:51 -0800 (PST) MIME-Version: 1.0 Received: by 10.76.11.42 with HTTP; Mon, 19 Jan 2015 06:17:31 -0800 (PST) In-Reply-To: References: <73A5261B-B690-4B95-A6CD-7DAF5F5A6AB4@c6-intelligence.com> <2E6A89A648463A4EBF093A9062C1668305A857EDF69D@SBMAILBOX1.sb.statsbiblioteket.dk> <54BBE0C4.5060504@safaribooksonline.com> From: Naresh Yadav Date: Mon, 19 Jan 2015 19:47:31 +0530 Message-ID: Subject: Re: Need Debug Direction on Performance Problem To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001a1140a7f2d3cf5f050d01fce8 X-Virus-Checked: Checked by ClamAV on apache.org --001a1140a7f2d3cf5f050d01fce8 Content-Type: text/plain; charset=UTF-8 Michael, i tried your idea of implementing own cursor in solr 4.6.1 itself but some how that testcase was taking huge time. Then i tried Cursor approach by upgrading solr to 4.10.3. With that got better results. For Setup 2 now time reduced from 114 minutes to 18 minutes but still little far from Setup1 i.e 2 minutes. Actually first 50 thousand request it self is taking about a minute. May be i would need to see other things as pagination seems working better now. thanks for giving valuable suggestions. On Mon, Jan 19, 2015 at 11:20 AM, Naresh Yadav wrote: > Toke, won't be able to use TermsComponent as i had complex filter criteria > on other fields. > > Michael, i understood your idea of paging without using start=, > will prototype it as it is possible in my usecase also and post here > results i got with this approach. > > > On Sun, Jan 18, 2015 at 10:05 PM, Michael Sokolov < > msokolov@safaribooksonline.com> wrote: > >> You can also implement your own cursor easily enough if you have a unique >> sortkey (not relevance score). Say you can sort by id, then you select >> batch 1 (50k docs, say) and record the last (maximum) id in the batch. For >> the next batch, limit it to id > last_id and get the first 50k docs (don't >> use start= for paging). This scales much better when scanning a large >> result set; you'll get constant time across the whole set instead of having >> it increase as you page deeper. >> >> -Mike >> >> >> On 1/18/2015 7:45 AM, Naresh Yadav wrote: >> >>> Hi Toke, >>> >>> Thanks for sharing solr internal's for my problem. I will definitely try >>> Cursor also but only problem is my current >>> solr version is 4.6.1 in which i guess cursor support is not there. Any >>> other option i have for this problem ?? >>> >>> Also as per your suggestion i will try to avoid regional units in post. >>> >>> Thanks >>> Naresh >>> >>> On Sun, Jan 18, 2015 at 4:19 PM, Toke Eskildsen >>> wrote: >>> >>> Naresh Yadav [nyadav.ait@gmail.com] wrote: >>>> >>>>> In both setups, we are reading in batches of 50k and each batch taking >>>>> Setup1 : approx 7 seconds and for completing all batches of total 10 >>>>> >>>> lakh >>>> >>>>> results takes 1 to 2 minutes. >>>>> Setup2 : approx 2-3 minutes and for completing all batches of total 10 >>>>> >>>> lakh >>>> >>>>> results takes 114 minutes. >>>>> >>>> Deep paging across shards without cursors means that for each request, >>>> the >>>> full result set up to that point must be requested from each shard. The >>>> deeper your page, the longer it takes for each request. If you only >>>> extracted 500K results instead of the 1M in setup 2, it would likely >>>> take a >>>> lot less than 114/2 minutes. >>>> >>>> Since you are exporting the full result set, you should be using a >>>> cursor: >>>> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results >>>> This should make your extraction linear to the number of documents and >>>> hopefully a lot faster than your current setup. >>>> >>>> Also, please refrain from using regional units such as "lakh" in an >>>> international forum. It requires some readers (me for example) to >>>> perform a >>>> search in order to be sure on what you are talking about. >>>> >>>> - Toke Eskildsen >>>> >>>> >> > > > > --001a1140a7f2d3cf5f050d01fce8--