Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of apache@elyograg.org designates
 166.70.79.219 as permitted sender)
Message-ID: <54E3546F.7070602@elyograg.org>
Date: Tue, 17 Feb 2015 07:47:11 -0700
From: Shawn Heisey <apache@elyograg.org>
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64;
 rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: solr-user@lucene.apache.org
Subject: Re: Too many merges, stalling...
References: 
 <CAORF7jkpH8foFMSn8cwCLCaD6sKaT7GHt6VBdhziNYw_Qp-FFA@mail.gmail.com>
In-Reply-To: 
 <CAORF7jkpH8foFMSn8cwCLCaD6sKaT7GHt6VBdhziNYw_Qp-FFA@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

On 2/16/2015 8:12 PM, ralph tice wrote:
> Recently I turned on INFO level logging in order to get better insight
> as to what our Solr cluster is doing.  Sometimes as frequently as
> almost 3 times a second we get messages like:
> [CMS][qtp896644936-33133]: too many merges; stalling...
> 
> Less frequently we get:
> [TMP][commitScheduler-8-thread-1]:
> seg=_5dy(4.10.3):C13520226/1044084:delGen=318 size=2784.291 MB [skip:
> too large]
> 
> where size is 2500-4900MB.

I've trimmed most of your original message, but I will refer to things
you have mentioned in the unquoted portion.

The first message simply indicates that you have reached more
simultaneous merges than CMS is configured to allow (3 by default), so
it will stall all of them except one.  The javadocs say that the one
allowed to run will be the smallest, but I have observed the opposite --
the one that is allowed to run is always the largest.

The second message indicates that the merge under consideration would
have exceeded the maximum size, which defaults to 5GB, so it refused to
do that merge.

The mergeFactor setting is deprecated, but still works for now in 4.x
releases.  The reason your merges are happening so frequently is that
you have set this to a low value - 5.  Setting it to a larger value will
make merges less frequent.

The mergeFactor value is used to set maxMergeAtOnce and segmentsPerTier.
 A proper TieredMergePolicy config will have those two settings
(normally set to the same value) as well as maxMergeAtOnceExplicit,
which should be set to three times the value of the other two.  My
config uses 35, 35, and 105 for each of those values, respectively.

You can also allow more simultaneous merges in the CMS config.  I use a
value of 6 here, to avoid lengthy indexing stalls that will kill the DIH
connection to MySQL.  If the disks are standard spinning magnetic disks,
the number of CMS threads should be one.  If it's SSD, you can use more
threads.

Thanks,
Shawn