Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 46889 invoked from network); 12 Mar 2010 17:07:50 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 12 Mar 2010 17:07:50 -0000 Received: (qmail 87866 invoked by uid 500); 12 Mar 2010 17:07:11 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 87723 invoked by uid 500); 12 Mar 2010 17:07:11 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 87715 invoked by uid 99); 12 Mar 2010 17:07:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Mar 2010 17:07:11 +0000 X-ASF-Spam-Status: No, hits=4.7 required=10.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of siddhantgoel@gmail.com designates 74.125.92.25 as permitted sender) Received: from [74.125.92.25] (HELO qw-out-2122.google.com) (74.125.92.25) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Mar 2010 17:07:08 +0000 Received: by qw-out-2122.google.com with SMTP id 5so431776qwd.53 for ; Fri, 12 Mar 2010 09:06:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=Q4EN8CXvVw3/fpEdRsKGeg9rxmTY4WyjYa5JX1A5lNw=; b=Ihs9K2TEpbuRVGdmUBj8Vxiljzz/L++zhZwP5wq+J9+roEOHg8kKBD62f3TXwfgqPm eGOYTAe11RK+LoAOTCi4SktbG9aeLRbO94vNBMzA2UeLrNXmyT7QFbJL1AhNf6US10X+ noI2kNwhhdTLbRnWiInutDx7Wv5Rn49YmnXzM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=mzIjI1X76OAoZCVVtWvxI7NZYkt7dsfNC+uD80K7zmoleKCIOrZ/HL9lgKIDVmN+hW /s3BDO/L7YQc7plhCVEwRdiMhbPeKPFjw+IEFuT9OsgKd4IfcHr8wPgyg5K86PJMoRc0 irftEhjvn53nWYD23/eNjOX7iOoplVpVGH+X4= MIME-Version: 1.0 Received: by 10.220.107.26 with SMTP id z26mr1372967vco.24.1268413607245; Fri, 12 Mar 2010 09:06:47 -0800 (PST) In-Reply-To: <359a92831003120602l4ce1e343x18ff343d468c1e5@mail.gmail.com> References: <582430d51003110639v7e28a9d1md211e4bd6af361a4@mail.gmail.com> <359a92831003110700v166b7a93g6d016f2e67045257@mail.gmail.com> <582430d51003110733r7ee58c3bob163d555dd673e33@mail.gmail.com> <27868456.post@talk.nabble.com> <582430d51003120039t1dbec6cbve92dfa01a3604cf6@mail.gmail.com> <359a92831003120602l4ce1e343x18ff343d468c1e5@mail.gmail.com> Date: Fri, 12 Mar 2010 22:36:47 +0530 Message-ID: <582430d51003120906t12a27bd4w3150eef8e2bc8e8e@mail.gmail.com> Subject: Re: Solr Performance Issues From: Siddhant Goel To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=00c09f8c26057cfb0d04819d8efb --00c09f8c26057cfb0d04819d8efb Content-Type: text/plain; charset=ISO-8859-1 Hi, Thanks for your responses. It actually feels good to be able to locate where the bottlenecks are. I've created two sets of data - in the first one I'm measuring the time took purely on Solr's end, and in the other one I'm including network latency (just for reference). The data that I'm posting below contains the time took purely by Solr. I'm running 10 threads simultaneously and the average response time (for each query in each thread) remains close to 40 to 50 ms. But as soon as I increase the number of threads to something like 100, the response time goes up to ~600ms, and further up when the number of threads is close to 500. Yes the average time definitely depends on the number of concurrent requests. Going from memory, debugQuery=on will let you know how much time > was spent in various operations in SOLR. It's important to know > whether it was the searching, assembling the response, or > transmitting the data back to the client. I just tried this. The information that it gives me for a query that took 7165ms is - http://pastebin.ca/1835644 So out of the total time 7165ms, QueryComponent took most of the time. Plus I can see the load average going up when the number of threads is really high. So it actually makes sense. (I didn't add any other component while searching; it was a plain /select?q=query call). Like I mentioned earlier in this mail, I'm maintaining separate sets for data with/without network latency, and I don't think its the bottleneck. > How many threads does it take to peg the CPU? And what > response times are you getting when your number of threads is > around 10? > If the number of threads is greater than 100, that really takes its toll on the CPU. So probably thats the number. When the number of threads is around 10, the response times average to something like 60ms (and 95% of the queries fall within 100ms of that value). Thanks, > > Erick > > On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel >wrote: > > > I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS > disk > > caching. > > > > I think that at any point of time, there can be a maximum of > threads> concurrent requests, which happens to make sense btw (does it?). > > > > As I increase the number of threads, the load average shown by top goes > up > > to as high as 80%. But if I keep the number of threads low (~10), the > load > > average never goes beyond ~8). So probably thats the number of requests I > > can expect Solr to serve concurrently on this index size with this > > hardware. > > > > Can anyone give a general opinion as to how much hardware should be > > sufficient for a Solr deployment with an index size of ~43GB, containing > > around 2.5 million documents? I'm expecting it to serve at least 20 > > requests > > per second. Any experiences? > > > > Thanks > > > > On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West > >wrote: > > > > > > > > How much of your memory are you allocating to the JVM and how much are > > you > > > leaving free? > > > > > > If you don't leave enough free memory for the OS, the OS won't have a > > large > > > enough disk cache, and you will be hitting the disk for lots of > queries. > > > > > > You might want to monitor your Disk I/O using iostat and look at the > > > iowait. > > > > > > If you are doing phrase queries and your *prx file is significantly > > larger > > > than the available memory then when a slow phrase query hits Solr, the > > > contention for disk I/O with other queries could be slowing everything > > > down. > > > You might also want to look at the 90th and 99th percentile query times > > in > > > addition to the average. For our large indexes, we found at least an > > order > > > of magnitude difference between the average and 99th percentile > queries. > > > Again, if Solr gets hit with a few of those 99th percentile slow > queries > > > and > > > your not hitting your caches, chances are you will see serious > contention > > > for disk I/O.. > > > > > > Of course if you don't see any waiting on i/o, then your bottleneck is > > > probably somewhere else:) > > > > > > See > > > > > > > > > http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1 > > > for more background on our experience. > > > > > > Tom Burton-West > > > University of Michigan Library > > > www.hathitrust.org > > > > > > > > > > > > > > > > > On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel < > siddhantgoel@gmail.com > > > > >wrote: > > > > > > > > > Hi everyone, > > > > > > > > > > I have an index corresponding to ~2.5 million documents. The index > > size > > > > is > > > > > 43GB. The configuration of the machine which is running Solr is - > > Dual > > > > > Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB > > cache, > > > > 8GB > > > > > RAM, and 250 GB HDD. > > > > > > > > > > I'm observing a strange trend in the queries that I send to Solr. > The > > > > query > > > > > times for queries that I send earlier is much lesser than the > queries > > I > > > > > send > > > > > afterwards. For instance, if I write a script to query solr 5000 > > times > > > > > (with > > > > > 5000 distinct queries, most of them containing not more than 3-5 > > words) > > > > > with > > > > > 10 threads running in parallel, the average times for queries goes > > from > > > > > ~50ms in the beginning to ~6000ms. Is this expected or is there > > > > something > > > > > wrong with my configuration. Currently I've configured the > > > > queryResultCache > > > > > and the documentCache to contain 2048 entries (hit ratios for both > is > > > > close > > > > > to 50%). > > > > > > > > > > Apart from this, a general question that I want to ask is that is > > such > > > a > > > > > hardware enough for this scenario? I'm aiming at achieving around > 20 > > > > > queries > > > > > per second with the hardware mentioned above. > > > > > > > > > > Thanks, > > > > > > > > > > Regards, > > > > > > > > > > -- > > > > > - Siddhant > > > > > > > > > > > > > > > > > > > > > -- > > > - Siddhant > > > > > > > > > > > > -- > > > View this message in context: > > > http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > > > > > > -- > > - Siddhant > > > -- - Siddhant --00c09f8c26057cfb0d04819d8efb--