Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BA3C8CA89 for ; Wed, 2 May 2012 12:15:41 +0000 (UTC) Received: (qmail 35775 invoked by uid 500); 2 May 2012 12:15:38 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 35698 invoked by uid 500); 2 May 2012 12:15:38 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 35690 invoked by uid 99); 2 May 2012 12:15:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 May 2012 12:15:38 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qc0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 May 2012 12:15:34 +0000 Received: by qcsd1 with SMTP id d1so320213qcs.35 for ; Wed, 02 May 2012 05:15:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=PB/2koYPsvObfzagJat2oJGHlAggAKHcIonxjIZphOo=; b=C07mazieMMSV+8ddZmBQZtb2D0hHFWwLd6F8wi33qXwe0UKlduA6ZS7zQYcRxDjILE XxaI2JbEeEVyjXSxUWk2vbHdtTQQVKH4oxdaua4uGMLY7PPxXbKNYf5QMEBlV8wx9Egu r+uX9CVg4xt/Zjovsw7/gMQd6Hd0C+pbiy9fOleBdAwc90vCmwz9v8xNoEf7BRFeMZjq dgqdcTXsHuzJnE83czrswVOWToXKv5dcJxKdLD+FHzUiQsbuKsCHh8521bZYF/mMJ7IF sWD7kxNRFcpnyFkgpfjmM6ayZEplVNyZmTjHMo+XII6oIzHOyPYi77ux6GbBhX8eK8m1 oWbA== MIME-Version: 1.0 Received: by 10.50.187.226 with SMTP id fv2mr4708111igc.40.1335960913082; Wed, 02 May 2012 05:15:13 -0700 (PDT) Received: by 10.42.244.73 with HTTP; Wed, 2 May 2012 05:15:13 -0700 (PDT) In-Reply-To: <0E71CDF761B18841BF0AC7E45FAF96DF082C54A769@LONMXSMB04.win.dowjones.net> References: <0E71CDF761B18841BF0AC7E45FAF96DF082C54A207@LONMXSMB04.win.dowjones.net> <1335881528.58864.YahooMailNeo@web130101.mail.mud.yahoo.com> <0E71CDF761B18841BF0AC7E45FAF96DF082C54A6F0@LONMXSMB04.win.dowjones.net> <0E71CDF761B18841BF0AC7E45FAF96DF082C54A769@LONMXSMB04.win.dowjones.net> Date: Wed, 2 May 2012 08:15:13 -0400 Message-ID: Subject: Re: Solr Merge during off peak times From: Erick Erickson To: solr-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org But again, with a master/slave setup merging should be relatively benign. And at 200M docs, having a M/S setup is probably indicated. Here's a good writeup of mergepolicy http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/ If you're indexing and searching on a single machine, merging is much less important than how often you commit. If a M/S situation, then you're polling interval on the slave is important. I'd look at commit frequency long before I worried about merging, that's usually where people shoot themselves in the foot - by committing too often. Overall, your mergeFactor is probably less important than other parts of how you perform indexing/searching, but it does have some effect for sure... Best Erick On Wed, May 2, 2012 at 7:54 AM, Prakashganesh, Prabhu wrote: > We have a fairly large scale system - about 200 million docs and fairly h= igh indexing activity - about 300k docs per day with peak ingestion rates o= f about 20 docs per sec. I want to work out what a good mergeFactor setting= would be by testing with different mergeFactor settings. I think the defau= lt of 10 might be high, I want to try with 5 and compare. Unless I know whe= n a merge starts and finishes, it would be quite difficult to work out the = impact of changing mergeFactor. I want to be able to measure how long merge= s take, run queries during the merge activity and see what the response tim= es are etc.. > > Thanks > Prabhu > > -----Original Message----- > From: Erick Erickson [mailto:erickerickson@gmail.com] > Sent: 02 May 2012 12:40 > To: solr-user@lucene.apache.org > Subject: Re: Solr Merge during off peak times > > Why do you care? Merging is generally a background process, or are > you doing heavy indexing? In a master/slave setup, > it's usually not really relevant except that (with 3.x), massive merges > may temporarily stop indexing. Is that the problem? > > Look at the merge policys, there are configurations that make > this less painful. > > In trunk, DocumentWriterPerThread makes merges happen in the > background, which helps the long-pause-while-indexing problem. > > Best > Erick > > On Wed, May 2, 2012 at 7:22 AM, Prakashganesh, Prabhu > wrote: >> Ok, thanks Otis >> Another question on merging >> What is the best way to monitor merging? >> Is there something in the log file that I can look for? >> It seems like I have to monitor the system resources - read/write IOPS e= tc.. and work out when a merge happened >> It would be great if I can do it by looking at log files or in the admin= UI. Do you know if this can be done or if there is some tool for this? >> >> Thanks >> Prabhu >> >> -----Original Message----- >> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] >> Sent: 01 May 2012 15:12 >> To: solr-user@lucene.apache.org >> Subject: Re: Solr Merge during off peak times >> >> Hi Prabhu, >> >> I don't think such a merge policy exists, but it would be nice to have t= his option and I imagine it wouldn't be hard to write if you really just ba= se the merge or no merge decision on the time of day (and maybe day of the = week). >> >> Note that this should go into Lucene, not Solr, so if you decide to cont= ribute your work, please see=A0http://wiki.apache.org/lucene-java/HowToCont= ribute >> >> Otis >> ---- >> Performance Monitoring for Solr - http://sematext.com/spm >> >> >> >> >>>________________________________ >>> From: "Prakashganesh, Prabhu" >>>To: "solr-user@lucene.apache.org" >>>Sent: Tuesday, May 1, 2012 8:45 AM >>>Subject: Solr Merge during off peak times >>> >>>Hi, >>>=A0 I would like to know if there is a way to configure index merge poli= cy in solr so that the merging happens during off peak hours. Can you pleas= e let me know if such a merge policy configuration exists? >>> >>>Thanks >>>Prabhu >>> >>> >>>