Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 18C0B101A7 for ; Sat, 29 Jun 2013 22:59:14 +0000 (UTC) Received: (qmail 92497 invoked by uid 500); 29 Jun 2013 22:59:10 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 92441 invoked by uid 500); 29 Jun 2013 22:59:10 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 92433 invoked by uid 99); 29 Jun 2013 22:59:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 29 Jun 2013 22:59:10 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of peter.sturge@gmail.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 29 Jun 2013 22:59:04 +0000 Received: by mail-ie0-f176.google.com with SMTP id ar20so6191070iec.7 for ; Sat, 29 Jun 2013 15:58:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=D8d4EDyq2X8Ao6LJ3tXo93I+ODAFTXYCo8OTXgLwdbI=; b=qkkSJibn8c085mp0Ux1VccORt09pPNne+h7SS0NA67SxwJORud9lbFxBNkPSerFmuO ikev1W0wH2lb9dV4vuHY5aN5AyUisqA4jJV75qhOjGWXSIMADlrjyjR01HOJwXhgZ8vC ku5FJ5JZpre1zZk8JDnSCREEeXddlx9wZ090ULLck3sDXoPKb6doSJ/hT8NH0cADjzwX AOXnPxVgBBtUe6G9TIHJ1hb98qHw+2x+7LkoVa2RkYOtnM64UsIpoJxhoLgreXEGLzR0 eYOWwaucpE19piJusvIUykGmW1PL7TcMYPj/v5VRGQ/DjJQr5aZAGup4eC7S6589t5To SvVA== MIME-Version: 1.0 X-Received: by 10.50.176.131 with SMTP id ci3mr2130267igc.18.1372546722840; Sat, 29 Jun 2013 15:58:42 -0700 (PDT) Received: by 10.64.42.104 with HTTP; Sat, 29 Jun 2013 15:58:42 -0700 (PDT) In-Reply-To: References: Date: Sat, 29 Jun 2013 23:58:42 +0100 Message-ID: Subject: Re: Improving performance to return 2000+ documents From: Peter Sturge To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=089e0122aa1ada497f04e052ef01 X-Virus-Checked: Checked by ClamAV on apache.org --089e0122aa1ada497f04e052ef01 Content-Type: text/plain; charset=ISO-8859-1 Hello Utkarsh, This may or may not be relevant for your use-case, but the way we deal with this scenario is to retrieve the top N documents 5,10,20or100 at a time (user selectable). We can then page the results, changing the start parameter to return the next set. This allows us to 'retrieve' millions of documents - we just do it at the user's leisure, rather than make them wait for the whole lot in one go. This works well because users very rarely want to see ALL 2000 (or whatever number) documents at one time - it's simply too much to take in at one time. If your use-case involves an automated or offline procedure (e.g. running a report or some data-mining op), then presumably it doesn't matter so much it takes a bit longer (as long as it returns in some reasonble time). Have you looked at doing paging on the client-side - this will hugely speed-up your search time. HTH Peter On Sat, Jun 29, 2013 at 6:17 PM, Erick Erickson wrote: > Well, depending on how many docs get served > from the cache the time will vary. But this is > just ugly, if you can avoid this use-case it would > be a Good Thing. > > Problem here is that each and every shard must > assemble the list of 2,000 documents (just ID and > sort criteria, usually score). > > Then the node serving the original request merges > the sub-lists to pick the top 2,000. Then the node > sends another request to each shard to get > the full document. Then the node merges this > into the full list to return to the user. > > Solr really isn't built for this use-case, is it actually > a compelling situation? > > And having your document cache set at 1M is kinda > high if you have very big documents. > > FWIW, > Erick > > > On Fri, Jun 28, 2013 at 8:44 PM, Utkarsh Sengar >wrote: > > > Also, I don't see a consistent response time from solr, I ran ab again > and > > I get this: > > > > ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 " > > > > > http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > > " > > > > > > Benchmarking x.amazonaws.com (be patient) > > Completed 100 requests > > Completed 200 requests > > Completed 300 requests > > Completed 400 requests > > Completed 500 requests > > Finished 500 requests > > > > > > Server Software: > > Server Hostname: x.amazonaws.com > > Server Port: 8983 > > > > Document Path: > > > > > /solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > > Document Length: 1538537 bytes > > > > Concurrency Level: 10 > > Time taken for tests: 10.858 seconds > > Complete requests: 500 > > Failed requests: 8 > > (Connect: 0, Receive: 0, Length: 8, Exceptions: 0) > > Write errors: 0 > > Total transferred: 769297992 bytes > > HTML transferred: 769268492 bytes > > Requests per second: 46.05 [#/sec] (mean) > > Time per request: 217.167 [ms] (mean) > > Time per request: 21.717 [ms] (mean, across all concurrent > requests) > > Transfer rate: 69187.90 [Kbytes/sec] received > > > > Connection Times (ms) > > min mean[+/-sd] median max > > Connect: 0 0 0.3 0 2 > > Processing: 110 215 72.0 190 497 > > Waiting: 91 180 70.5 152 473 > > Total: 112 216 72.0 191 497 > > > > Percentage of the requests served within a certain time (ms) > > 50% 191 > > 66% 225 > > 75% 252 > > 80% 272 > > 90% 319 > > 95% 364 > > 98% 420 > > 99% 453 > > 100% 497 (longest request) > > > > > > Sometimes it takes a lot of time, sometimes its pretty quick. > > > > Thanks, > > -Utkarsh > > > > > > On Fri, Jun 28, 2013 at 5:39 PM, Utkarsh Sengar > >wrote: > > > > > Hello, > > > > > > I have a usecase where I need to retrive top 2000 documents matching a > > > query. > > > What are the parameters (in query, solrconfig, schema) I shoud look at > to > > > improve this? > > > > > > I have 45M documents in 3node solrcloud 4.3.1 with 3 shards, with 30GB > > > RAM, 8vCPU and 7GB JVM heap size. > > > > > > I have documentCache: > > > > > initialSize="1000000" autowarmCount="0"/> > > > > > > allText is a copyField. > > > > > > This is the result I get: > > > ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 " > > > > > > http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > > > " > > > > > > Benchmarking x.amazonaws.com (be patient) > > > Completed 100 requests > > > Completed 200 requests > > > Completed 300 requests > > > Completed 400 requests > > > Completed 500 requests > > > Finished 500 requests > > > > > > > > > Server Software: > > > Server Hostname: x.amazonaws.com > > > Server Port: 8983 > > > > > > Document Path: > > > > > > /solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > > > Document Length: 1538537 bytes > > > > > > Concurrency Level: 10 > > > Time taken for tests: 35.999 seconds > > > Complete requests: 500 > > > Failed requests: 21 > > > (Connect: 0, Receive: 0, Length: 21, Exceptions: 0) > > > Write errors: 0 > > > Non-2xx responses: 2 > > > Total transferred: 766221660 bytes > > > HTML transferred: 766191806 bytes > > > Requests per second: 13.89 [#/sec] (mean) > > > Time per request: 719.981 [ms] (mean) > > > Time per request: 71.998 [ms] (mean, across all concurrent > > requests) > > > Transfer rate: 20785.65 [Kbytes/sec] received > > > > > > Connection Times (ms) > > > min mean[+/-sd] median max > > > Connect: 0 0 0.6 0 8 > > > Processing: 9 717 2339.6 199 12611 > > > Waiting: 9 635 2233.6 164 12580 > > > Total: 9 718 2339.6 199 12611 > > > > > > Percentage of the requests served within a certain time (ms) > > > 50% 199 > > > 66% 236 > > > 75% 263 > > > 80% 281 > > > 90% 548 > > > 95% 838 > > > 98% 12475 > > > 99% 12545 > > > 100% 12611 (longest request) > > > > > > -- > > > Thanks, > > > -Utkarsh > > > > > > > > > > > -- > > Thanks, > > -Utkarsh > > > --089e0122aa1ada497f04e052ef01--