lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: Throughput Optimization
Date Wed, 05 Nov 2008 15:25:34 GMT
You're probably hitting some contention with the locking around the
reading of index files... this has been recently improved in Lucene
for non-Windows boxes, and we're integrating that into Solr (should
def be in the next release).

-Yonik

On Tue, Nov 4, 2008 at 9:01 PM, wojtekpia <wojtek_p@hotmail.com> wrote:
>
> I've been running load tests over the past week or 2, and I can't figure out
> my system's bottle neck that prevents me from increasing throughput. First
> I'll describe my Solr setup, then what I've tried to optimize the system.
>
> I have 10 million records and 59 fields (all are indexed, 37 are stored, 17
> have termVectors, 33 are multi-valued) which takes about 15GB of disk space.
> Most field values are very short (single word or number), and usually about
> half the fields have any data at all. I'm running on an 8-core, 64-bit, 32GB
> RAM Redhat box. I allocate about 24GB of memory to the java process, and my
> filterCache size is 700,000. I'm using a version of Solr between 1.3 and the
> current trunk (including the latest SOLR-667 (FastLRUCache) patch), and
> Tomcat 6.0.
>
> I'm running a ramp-test, increasing the number of users every few minutes. I
> measure the maximum number of requests that Solr can handle per second with
> a fixed response time, and call that my throughput. I'd like to see a single
> physical resource be maxed out at some point during my test so I know it is
> my bottle neck. I generated random queries for my dataset representing a
> more or less realistic scenario. The queries include faceting by up to 6
> fields, and quering by up to 8 fields.
>
> I ran a baseline on the un-optimized setup, and saw peak CPU usage of about
> 50%, IO usage around 5%, and negligible network traffic. Interestingly, the
> CPU peaked when I had 8 concurrent users, and actually dropped down to about
> 40% when I increased the users beyond 8. Is that because I have 8 cores?
>
> I changed a few settings and observed the effect on throughput:
>
> 1. Increased filterCache size, and throughput increased by about 50%, but it
> seems to peak.
> 2. Put the entire index on a RAM disk, and significantly reduced the average
> response time, but my throughput didn't change (i.e. even though my response
> time was 10X faster, the maximum number of requests I could make per second
> didn't increase). This makes no sense to me, unless there is another bottle
> neck somewhere.
> 3. Reduced the number of records in my index. The throughput increased, but
> the shape of all my graphs stayed the same, and my CPU usage was identical.
>
> I have a few questions:
> 1. Can I get more than 50% CPU utilization?
> 2. Why does CPU utilization fall when I make more than 8 concurrent
> requests?
> 3. Is there an obvious bottleneck that I'm missing?
> 4. Does Tomcat have any settings that affect Solr performance?
>
> Any input is greatly appreciated.

Mime
View raw message