Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 42387 invoked from network); 5 Nov 2008 15:26:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Nov 2008 15:26:06 -0000 Received: (qmail 32372 invoked by uid 500); 5 Nov 2008 15:26:09 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 32344 invoked by uid 500); 5 Nov 2008 15:26:09 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 32333 invoked by uid 99); 5 Nov 2008 15:26:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Nov 2008 07:26:09 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yseeley@gmail.com designates 209.85.198.227 as permitted sender) Received: from [209.85.198.227] (HELO rv-out-0506.google.com) (209.85.198.227) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Nov 2008 15:24:53 +0000 Received: by rv-out-0506.google.com with SMTP id f6so43721rvb.5 for ; Wed, 05 Nov 2008 07:25:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender :to:subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references :x-google-sender-auth; bh=zYSnMZDAs9m+7+wazisXfF8zmp+pAPbG2IkgXPw0ReQ=; b=T5P35B/6PsnTw1txKtS3t/i1aFzWBv4z6HwMjCH2UNBttuwRvaE0E5qthn6BQtVqOG jIKsioBEDUQ5nUkS742yY4wewesDayUAoL2kNv//tM6VTYmwTGq8kRRNrHJdP8A2HILE cez984/NMGEbcXOL4EONoZFANWqGWxx5/a0LA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references:x-google-sender-auth; b=wtoqP05p4mylowBtiYnwsR3aOPCKJYSjkHmB/612t9LHyHKeWwhPp8HGXJNiZYmBQk 26r/fXW5QY0pSEeA+eOZLmkfU5o9owNNppfAjmu2W68qQVLGoIjXJIrbt2dKsmOxCiAG UeESHLybf6wkSuJFLl7r/DAtDuxL1nonGh2gA= Received: by 10.140.188.19 with SMTP id l19mr469621rvf.216.1225898734962; Wed, 05 Nov 2008 07:25:34 -0800 (PST) Received: by 10.141.212.15 with HTTP; Wed, 5 Nov 2008 07:25:34 -0800 (PST) Message-ID: Date: Wed, 5 Nov 2008 10:25:34 -0500 From: "Yonik Seeley" Sender: yseeley@gmail.com To: solr-user@lucene.apache.org Subject: Re: Throughput Optimization In-Reply-To: <20335132.post@talk.nabble.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20335132.post@talk.nabble.com> X-Google-Sender-Auth: 6ba23f1e32b35679 X-Virus-Checked: Checked by ClamAV on apache.org You're probably hitting some contention with the locking around the reading of index files... this has been recently improved in Lucene for non-Windows boxes, and we're integrating that into Solr (should def be in the next release). -Yonik On Tue, Nov 4, 2008 at 9:01 PM, wojtekpia wrote: > > I've been running load tests over the past week or 2, and I can't figure out > my system's bottle neck that prevents me from increasing throughput. First > I'll describe my Solr setup, then what I've tried to optimize the system. > > I have 10 million records and 59 fields (all are indexed, 37 are stored, 17 > have termVectors, 33 are multi-valued) which takes about 15GB of disk space. > Most field values are very short (single word or number), and usually about > half the fields have any data at all. I'm running on an 8-core, 64-bit, 32GB > RAM Redhat box. I allocate about 24GB of memory to the java process, and my > filterCache size is 700,000. I'm using a version of Solr between 1.3 and the > current trunk (including the latest SOLR-667 (FastLRUCache) patch), and > Tomcat 6.0. > > I'm running a ramp-test, increasing the number of users every few minutes. I > measure the maximum number of requests that Solr can handle per second with > a fixed response time, and call that my throughput. I'd like to see a single > physical resource be maxed out at some point during my test so I know it is > my bottle neck. I generated random queries for my dataset representing a > more or less realistic scenario. The queries include faceting by up to 6 > fields, and quering by up to 8 fields. > > I ran a baseline on the un-optimized setup, and saw peak CPU usage of about > 50%, IO usage around 5%, and negligible network traffic. Interestingly, the > CPU peaked when I had 8 concurrent users, and actually dropped down to about > 40% when I increased the users beyond 8. Is that because I have 8 cores? > > I changed a few settings and observed the effect on throughput: > > 1. Increased filterCache size, and throughput increased by about 50%, but it > seems to peak. > 2. Put the entire index on a RAM disk, and significantly reduced the average > response time, but my throughput didn't change (i.e. even though my response > time was 10X faster, the maximum number of requests I could make per second > didn't increase). This makes no sense to me, unless there is another bottle > neck somewhere. > 3. Reduced the number of records in my index. The throughput increased, but > the shape of all my graphs stayed the same, and my CPU usage was identical. > > I have a few questions: > 1. Can I get more than 50% CPU utilization? > 2. Why does CPU utilization fall when I make more than 8 concurrent > requests? > 3. Is there an obvious bottleneck that I'm missing? > 4. Does Tomcat have any settings that affect Solr performance? > > Any input is greatly appreciated.