lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Throughput Optimization
Date Wed, 05 Nov 2008 15:43:01 GMT
The latest alt directory patch uses It.

- Mark


On Nov 5, 2008, at 9:25 AM, "Yonik Seeley" <yonik@apache.org> wrote:

> You're probably hitting some contention with the locking around the
> reading of index files... this has been recently improved in Lucene
> for non-Windows boxes, and we're integrating that into Solr (should
> def be in the next release).
>
> -Yonik
>
> On Tue, Nov 4, 2008 at 9:01 PM, wojtekpia <wojtek_p@hotmail.com>  
> wrote:
>>
>> I've been running load tests over the past week or 2, and I can't  
>> figure out
>> my system's bottle neck that prevents me from increasing  
>> throughput. First
>> I'll describe my Solr setup, then what I've tried to optimize the  
>> system.
>>
>> I have 10 million records and 59 fields (all are indexed, 37 are  
>> stored, 17
>> have termVectors, 33 are multi-valued) which takes about 15GB of  
>> disk space.
>> Most field values are very short (single word or number), and  
>> usually about
>> half the fields have any data at all. I'm running on an 8-core, 64- 
>> bit, 32GB
>> RAM Redhat box. I allocate about 24GB of memory to the java  
>> process, and my
>> filterCache size is 700,000. I'm using a version of Solr between  
>> 1.3 and the
>> current trunk (including the latest SOLR-667 (FastLRUCache) patch),  
>> and
>> Tomcat 6.0.
>>
>> I'm running a ramp-test, increasing the number of users every few  
>> minutes. I
>> measure the maximum number of requests that Solr can handle per  
>> second with
>> a fixed response time, and call that my throughput. I'd like to see  
>> a single
>> physical resource be maxed out at some point during my test so I  
>> know it is
>> my bottle neck. I generated random queries for my dataset  
>> representing a
>> more or less realistic scenario. The queries include faceting by up  
>> to 6
>> fields, and quering by up to 8 fields.
>>
>> I ran a baseline on the un-optimized setup, and saw peak CPU usage  
>> of about
>> 50%, IO usage around 5%, and negligible network traffic.  
>> Interestingly, the
>> CPU peaked when I had 8 concurrent users, and actually dropped down  
>> to about
>> 40% when I increased the users beyond 8. Is that because I have 8  
>> cores?
>>
>> I changed a few settings and observed the effect on throughput:
>>
>> 1. Increased filterCache size, and throughput increased by about  
>> 50%, but it
>> seems to peak.
>> 2. Put the entire index on a RAM disk, and significantly reduced  
>> the average
>> response time, but my throughput didn't change (i.e. even though my  
>> response
>> time was 10X faster, the maximum number of requests I could make  
>> per second
>> didn't increase). This makes no sense to me, unless there is  
>> another bottle
>> neck somewhere.
>> 3. Reduced the number of records in my index. The throughput  
>> increased, but
>> the shape of all my graphs stayed the same, and my CPU usage was  
>> identical.
>>
>> I have a few questions:
>> 1. Can I get more than 50% CPU utilization?
>> 2. Why does CPU utilization fall when I make more than 8 concurrent
>> requests?
>> 3. Is there an obvious bottleneck that I'm missing?
>> 4. Does Tomcat have any settings that affect Solr performance?
>>
>> Any input is greatly appreciated.

Mime
View raw message