Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
Received-SPF: neutral (athena.apache.org: 80.12.242.111 is neither permitted
 nor denied by domain of christophe@lemoine-fr.com)
Message-ID: <4911C827.4020906@lemoine-fr.com>
Date: Wed, 05 Nov 2008 18:21:59 +0200
From: christophe <christophe@lemoine-fr.com>
User-Agent: Thunderbird 2.0.0.17 (X11/20080925)
MIME-Version: 1.0
To: solr-user@lucene.apache.org
Subject: Re: Throughput Optimization
References: <20335132.post@talk.nabble.com>
 <2698C208-8428-4930-BE6C-E523A830055B@ehatchersolutions.com>
 <20343425.post@talk.nabble.com>
In-Reply-To: <20343425.post@talk.nabble.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Does the number of searcher affect CPU usage ?
Not totally sure about it but I think some versions of Tomcat were not 
totally scalable over 4 CPUs (or 4 cores).

C.

wojtekpia wrote:
> Yes, I am seeing evictions. I've tried setting my filterCache higher, but
> then I start getting Out Of Memory exceptions. My filterCache hit ratio is >
> .99. It looks like I've hit a RAM bound here.
>
> I ran a test without faceting. The response times / throughput were both
> significantly higher, there were no evictions from the filter cache, but I
> still wasn't getting > 50% CPU utilization. Any thoughts on what physical
> bound I've hit in this case?
>
>
>
> Erik Hatcher wrote:
>   
>> One quick question.... are you seeing any evictions from your  
>> filterCache?  If so, it isn't set large enough to handle the faceting  
>> you're doing.
>>
>> 	Erik
>>
>>
>> On Nov 4, 2008, at 8:01 PM, wojtekpia wrote:
>>
>>     
>>> I've been running load tests over the past week or 2, and I can't  
>>> figure out
>>> my system's bottle neck that prevents me from increasing throughput.  
>>> First
>>> I'll describe my Solr setup, then what I've tried to optimize the  
>>> system.
>>>
>>> I have 10 million records and 59 fields (all are indexed, 37 are  
>>> stored, 17
>>> have termVectors, 33 are multi-valued) which takes about 15GB of  
>>> disk space.
>>> Most field values are very short (single word or number), and  
>>> usually about
>>> half the fields have any data at all. I'm running on an 8-core, 64- 
>>> bit, 32GB
>>> RAM Redhat box. I allocate about 24GB of memory to the java process,  
>>> and my
>>> filterCache size is 700,000. I'm using a version of Solr between 1.3  
>>> and the
>>> current trunk (including the latest SOLR-667 (FastLRUCache) patch),  
>>> and
>>> Tomcat 6.0.
>>>
>>> I'm running a ramp-test, increasing the number of users every few  
>>> minutes. I
>>> measure the maximum number of requests that Solr can handle per  
>>> second with
>>> a fixed response time, and call that my throughput. I'd like to see  
>>> a single
>>> physical resource be maxed out at some point during my test so I  
>>> know it is
>>> my bottle neck. I generated random queries for my dataset  
>>> representing a
>>> more or less realistic scenario. The queries include faceting by up  
>>> to 6
>>> fields, and quering by up to 8 fields.
>>>
>>> I ran a baseline on the un-optimized setup, and saw peak CPU usage  
>>> of about
>>> 50%, IO usage around 5%, and negligible network traffic.  
>>> Interestingly, the
>>> CPU peaked when I had 8 concurrent users, and actually dropped down  
>>> to about
>>> 40% when I increased the users beyond 8. Is that because I have 8  
>>> cores?
>>>
>>> I changed a few settings and observed the effect on throughput:
>>>
>>> 1. Increased filterCache size, and throughput increased by about  
>>> 50%, but it
>>> seems to peak.
>>> 2. Put the entire index on a RAM disk, and significantly reduced the  
>>> average
>>> response time, but my throughput didn't change (i.e. even though my  
>>> response
>>> time was 10X faster, the maximum number of requests I could make per  
>>> second
>>> didn't increase). This makes no sense to me, unless there is another  
>>> bottle
>>> neck somewhere.
>>> 3. Reduced the number of records in my index. The throughput  
>>> increased, but
>>> the shape of all my graphs stayed the same, and my CPU usage was  
>>> identical.
>>>
>>> I have a few questions:
>>> 1. Can I get more than 50% CPU utilization?
>>> 2. Why does CPU utilization fall when I make more than 8 concurrent
>>> requests?
>>> 3. Is there an obvious bottleneck that I'm missing?
>>> 4. Does Tomcat have any settings that affect Solr performance?
>>>
>>> Any input is greatly appreciated.
>>>
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Throughput-Optimization-tp20335132p20335132.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>       
>>
>>     
>
>