lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr Merge during off peak times
Date Wed, 02 May 2012 12:15:13 GMT
But again, with a master/slave setup merging should
be relatively benign. And at 200M docs, having a M/S
setup is probably indicated.

Here's a good writeup of mergepolicy
http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/

If you're indexing and searching on a single machine, merging
is much less important than how often you commit. If a M/S
situation, then you're polling interval on the slave is important.

I'd look at commit frequency long before I worried about merging,
that's usually where people shoot themselves in the foot - by
committing too often.

Overall, your mergeFactor is probably less important than other
parts of how you perform indexing/searching, but it does have
some effect for sure...

Best
Erick

On Wed, May 2, 2012 at 7:54 AM, Prakashganesh, Prabhu
<Prabhu.Prakashganesh@dowjones.com> wrote:
> We have a fairly large scale system - about 200 million docs and fairly high indexing
activity - about 300k docs per day with peak ingestion rates of about 20 docs per sec. I want
to work out what a good mergeFactor setting would be by testing with different mergeFactor
settings. I think the default of 10 might be high, I want to try with 5 and compare. Unless
I know when a merge starts and finishes, it would be quite difficult to work out the impact
of changing mergeFactor. I want to be able to measure how long merges take, run queries during
the merge activity and see what the response times are etc..
>
> Thanks
> Prabhu
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: 02 May 2012 12:40
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Merge during off peak times
>
> Why do you care? Merging is generally a background process, or are
> you doing heavy indexing? In a master/slave setup,
> it's usually not really relevant except that (with 3.x), massive merges
> may temporarily stop indexing. Is that the problem?
>
> Look at the merge policys, there are configurations that make
> this less painful.
>
> In trunk, DocumentWriterPerThread makes merges happen in the
> background, which helps the long-pause-while-indexing problem.
>
> Best
> Erick
>
> On Wed, May 2, 2012 at 7:22 AM, Prakashganesh, Prabhu
> <Prabhu.Prakashganesh@dowjones.com> wrote:
>> Ok, thanks Otis
>> Another question on merging
>> What is the best way to monitor merging?
>> Is there something in the log file that I can look for?
>> It seems like I have to monitor the system resources - read/write IOPS etc.. and
work out when a merge happened
>> It would be great if I can do it by looking at log files or in the admin UI. Do you
know if this can be done or if there is some tool for this?
>>
>> Thanks
>> Prabhu
>>
>> -----Original Message-----
>> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
>> Sent: 01 May 2012 15:12
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Merge during off peak times
>>
>> Hi Prabhu,
>>
>> I don't think such a merge policy exists, but it would be nice to have this option
and I imagine it wouldn't be hard to write if you really just base the merge or no merge decision
on the time of day (and maybe day of the week).
>>
>> Note that this should go into Lucene, not Solr, so if you decide to contribute your
work, please see http://wiki.apache.org/lucene-java/HowToContribute
>>
>> Otis
>> ----
>> Performance Monitoring for Solr - http://sematext.com/spm
>>
>>
>>
>>
>>>________________________________
>>> From: "Prakashganesh, Prabhu" <Prabhu.Prakashganesh@dowjones.com>
>>>To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
>>>Sent: Tuesday, May 1, 2012 8:45 AM
>>>Subject: Solr Merge during off peak times
>>>
>>>Hi,
>>>  I would like to know if there is a way to configure index merge policy in solr
so that the merging happens during off peak hours. Can you please let me know if such a merge
policy configuration exists?
>>>
>>>Thanks
>>>Prabhu
>>>
>>>
>>>

Mime
View raw message