Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C518B10B27 for ; Tue, 17 Feb 2015 14:48:06 +0000 (UTC) Received: (qmail 21522 invoked by uid 500); 17 Feb 2015 14:48:03 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 21449 invoked by uid 500); 17 Feb 2015 14:48:03 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 21437 invoked by uid 99); 17 Feb 2015 14:48:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Feb 2015 14:48:02 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of apache@elyograg.org designates 166.70.79.219 as permitted sender) Received: from [166.70.79.219] (HELO frodo.elyograg.org) (166.70.79.219) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Feb 2015 14:47:37 +0000 Received: from localhost (localhost [127.0.0.1]) by frodo.elyograg.org (Postfix) with ESMTP id D119D8AA6 for ; Tue, 17 Feb 2015 07:47:13 -0700 (MST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=elyograg.org; h= content-transfer-encoding:content-type:content-type:in-reply-to :references:subject:subject:mime-version:user-agent:from:from :date:date:message-id:received:received; s=mail; t=1424184433; bh=cF/z+2Highcn2ZypSKWtGdefgP9HMf+Lnc8PczBkeNg=; b=XSQVqjaL4Hoi Lb/86MJyyeXK0Tt2U3/Fh7ty1VUdGpJQYuI/JxWDCQa0Ub9t8oIuKvy+TvbFDh4s nG3Yz6GuiyaCbA4kDOEbyGjtMLNC5rqVotSpg3GrreoEUNMTgXe+9Vov2ndQnJvq QtL2/K+K74kpYENrtY3kDqqeBnxO+LM= X-Virus-Scanned: Debian amavisd-new at frodo.elyograg.org Received: from frodo.elyograg.org ([127.0.0.1]) by localhost (frodo.elyograg.org [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 4QfbB5A+ljHV for ; Tue, 17 Feb 2015 07:47:13 -0700 (MST) Received: from [192.168.1.102] (102.int.elyograg.org [192.168.1.102]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: elyograg@elyograg.org) by frodo.elyograg.org (Postfix) with ESMTPSA id DC2462842 for ; Tue, 17 Feb 2015 07:47:12 -0700 (MST) Message-ID: <54E3546F.7070602@elyograg.org> Date: Tue, 17 Feb 2015 07:47:11 -0700 From: Shawn Heisey User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: solr-user@lucene.apache.org Subject: Re: Too many merges, stalling... References: In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 2/16/2015 8:12 PM, ralph tice wrote: > Recently I turned on INFO level logging in order to get better insight > as to what our Solr cluster is doing. Sometimes as frequently as > almost 3 times a second we get messages like: > [CMS][qtp896644936-33133]: too many merges; stalling... > > Less frequently we get: > [TMP][commitScheduler-8-thread-1]: > seg=_5dy(4.10.3):C13520226/1044084:delGen=318 size=2784.291 MB [skip: > too large] > > where size is 2500-4900MB. I've trimmed most of your original message, but I will refer to things you have mentioned in the unquoted portion. The first message simply indicates that you have reached more simultaneous merges than CMS is configured to allow (3 by default), so it will stall all of them except one. The javadocs say that the one allowed to run will be the smallest, but I have observed the opposite -- the one that is allowed to run is always the largest. The second message indicates that the merge under consideration would have exceeded the maximum size, which defaults to 5GB, so it refused to do that merge. The mergeFactor setting is deprecated, but still works for now in 4.x releases. The reason your merges are happening so frequently is that you have set this to a low value - 5. Setting it to a larger value will make merges less frequent. The mergeFactor value is used to set maxMergeAtOnce and segmentsPerTier. A proper TieredMergePolicy config will have those two settings (normally set to the same value) as well as maxMergeAtOnceExplicit, which should be set to three times the value of the other two. My config uses 35, 35, and 105 for each of those values, respectively. You can also allow more simultaneous merges in the CMS config. I use a value of 6 here, to avoid lengthy indexing stalls that will kill the DIH connection to MySQL. If the disks are standard spinning magnetic disks, the number of CMS threads should be one. If it's SSD, you can use more threads. Thanks, Shawn