lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Feak, Todd" <Todd.F...@smss.sony.com>
Subject RE: Throughput Optimization
Date Wed, 05 Nov 2008 16:30:19 GMT
If you are seeing < 90% CPU usage and are not IO (File or Network)
bound, then you are most probably bound by lock contention. If your CPU
usage goes down as you throw more threads at the box, that's an even
bigger indication that that is the issue.

A good profiling tool should help you locate this. I'm not endorsing it
in any way, but I've use YourKit locally and have been able to see where
the actual contention was coming from. That lead to my interest in the
SOLR-667 cache fixes which provided enormous benefit.

-Todd


-----Original Message-----
From: wojtekpia [mailto:wojtek_p@hotmail.com] 
Sent: Wednesday, November 05, 2008 8:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Throughput Optimization


Yes, I am seeing evictions. I've tried setting my filterCache higher,
but
then I start getting Out Of Memory exceptions. My filterCache hit ratio
is >
.99. It looks like I've hit a RAM bound here.

I ran a test without faceting. The response times / throughput were both
significantly higher, there were no evictions from the filter cache, but
I
still wasn't getting > 50% CPU utilization. Any thoughts on what
physical
bound I've hit in this case?



Erik Hatcher wrote:
> 
> One quick question.... are you seeing any evictions from your  
> filterCache?  If so, it isn't set large enough to handle the faceting

> you're doing.
> 
> 	Erik
> 
> 
> On Nov 4, 2008, at 8:01 PM, wojtekpia wrote:
> 
>>
>> I've been running load tests over the past week or 2, and I can't  
>> figure out
>> my system's bottle neck that prevents me from increasing throughput.

>> First
>> I'll describe my Solr setup, then what I've tried to optimize the  
>> system.
>>
>> I have 10 million records and 59 fields (all are indexed, 37 are  
>> stored, 17
>> have termVectors, 33 are multi-valued) which takes about 15GB of  
>> disk space.
>> Most field values are very short (single word or number), and  
>> usually about
>> half the fields have any data at all. I'm running on an 8-core, 64- 
>> bit, 32GB
>> RAM Redhat box. I allocate about 24GB of memory to the java process,

>> and my
>> filterCache size is 700,000. I'm using a version of Solr between 1.3

>> and the
>> current trunk (including the latest SOLR-667 (FastLRUCache) patch),  
>> and
>> Tomcat 6.0.
>>
>> I'm running a ramp-test, increasing the number of users every few  
>> minutes. I
>> measure the maximum number of requests that Solr can handle per  
>> second with
>> a fixed response time, and call that my throughput. I'd like to see  
>> a single
>> physical resource be maxed out at some point during my test so I  
>> know it is
>> my bottle neck. I generated random queries for my dataset  
>> representing a
>> more or less realistic scenario. The queries include faceting by up  
>> to 6
>> fields, and quering by up to 8 fields.
>>
>> I ran a baseline on the un-optimized setup, and saw peak CPU usage  
>> of about
>> 50%, IO usage around 5%, and negligible network traffic.  
>> Interestingly, the
>> CPU peaked when I had 8 concurrent users, and actually dropped down  
>> to about
>> 40% when I increased the users beyond 8. Is that because I have 8  
>> cores?
>>
>> I changed a few settings and observed the effect on throughput:
>>
>> 1. Increased filterCache size, and throughput increased by about  
>> 50%, but it
>> seems to peak.
>> 2. Put the entire index on a RAM disk, and significantly reduced the

>> average
>> response time, but my throughput didn't change (i.e. even though my  
>> response
>> time was 10X faster, the maximum number of requests I could make per

>> second
>> didn't increase). This makes no sense to me, unless there is another

>> bottle
>> neck somewhere.
>> 3. Reduced the number of records in my index. The throughput  
>> increased, but
>> the shape of all my graphs stayed the same, and my CPU usage was  
>> identical.
>>
>> I have a few questions:
>> 1. Can I get more than 50% CPU utilization?
>> 2. Why does CPU utilization fall when I make more than 8 concurrent
>> requests?
>> 3. Is there an obvious bottleneck that I'm missing?
>> 4. Does Tomcat have any settings that affect Solr performance?
>>
>> Any input is greatly appreciated.
>>
>> -- 
>> View this message in context:
>>
http://www.nabble.com/Throughput-Optimization-tp20335132p20335132.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context:
http://www.nabble.com/Throughput-Optimization-tp20335132p20343425.html
Sent from the Solr - User mailing list archive at Nabble.com.



Mime
View raw message