Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4D1237F9C for ; Wed, 28 Sep 2011 14:41:09 +0000 (UTC) Received: (qmail 48906 invoked by uid 500); 28 Sep 2011 14:41:06 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 48861 invoked by uid 500); 28 Sep 2011 14:41:06 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 48853 invoked by uid 99); 28 Sep 2011 14:41:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Sep 2011 14:41:06 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [130.225.24.68] (HELO sbexch03.sb.statsbiblioteket.dk) (130.225.24.68) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Sep 2011 14:40:58 +0000 Received: from [130.225.25.23] (130.225.25.23) by sbexch03.sb.statsbiblioteket.dk (130.225.24.68) with Microsoft SMTP Server id 8.3.192.1; Wed, 28 Sep 2011 16:40:35 +0200 Subject: Re: strange performance issue with many shards on one server From: Toke Eskildsen Reply-To: te@statsbiblioteket.dk To: "solr-user@lucene.apache.org" In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Organization: State and University Library, Denmark Date: Wed, 28 Sep 2011 16:40:25 +0200 Message-ID: <1317220825.3165.72.camel@te-prime> MIME-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit On Wed, 2011-09-28 at 12:58 +0200, Frederik Kraus wrote: > - 10 shards per server (needed for response times) running in a single tomcat instance Have you tested that sharding actually decreases response times in your case? I see the idea in decreasing response times with sharding at the cost of decreasing throughput, but the added overhead of merging is non-trivial. > - each query queries all 20 shards (distributed search) > > - each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries) > - all caches are warmed / high cache hit rates (99%) etc. > Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries. It sounds as if there's a hard limit on the number of concurrent users somewhere. I am no expert in httpclient, but the blocked threads in your thread dump seems to indicate that they wait for connections to be established rather than for results to be produced. I seem to remember that tomcat has a default limit on 200 concurrent connections and with 10 shards/search, that is just 200 / (10 shard_connections + 1 incoming_connection) = 18 concurrent searches. > Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer. Could be garbage collection, especially since it shows under high load which might result in more old objects and thereby trigger full gc.