lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr Merge during off peak times
Date Wed, 02 May 2012 14:26:08 GMT
Optimizing is much less important query-speed wise
than historically, essentially it's not recommended much
any more.

A significant effect of optimize _used_ to be purging
obsolete data (i.e. that from deleted docs) from the
index, but that is now done on merge.

There's no harm in optimizing on off-peak hours, and
combined with an appropriate merge policy that may make
indexing a little better (I'm thinking of not doing
as many massive merges here).

BTW, in 4.0, there's DocumentWriterPerThread that
merges in the background and pretty much removes
even this as a motivation for optimizing.

All that said, optimizing isn't _bad_, it's just often
unnecessary.

Best
Erick

On Wed, May 2, 2012 at 9:29 AM, Prakashganesh, Prabhu
<Prabhu.Prakashganesh@dowjones.com> wrote:
> Actually we are not thinking of a M/S setup
> We are planning to have x number of shards on N number of servers, each of the shard
handling both indexing and searching
> The expected query volume is not that high, so don't think we would need to replicate
to slaves. We think each shard will be able to handle its share of the indexing and searching.
If we need to scale query capacity in future, yeah probably need to do it by replicating each
shard to its slaves
>
> I agree autoCommit settings would be good to set up appropriately
>
> Another question I had is pros/cons of optimising the index. We would be purging old
content every week and am thinking whether to run an index optimise in the weekend after purging
old data. Because we are going to be continuously indexing data which would be mix of adds,
updates, deletes, not sure if the benefit of optimising would last long enough to be worth
doing it. Maybe setting a low mergeFactor would be good enough. Optimising makes sense if
the index is more static, perhaps? Thoughts?
>
> Thanks
> Prabhu
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: 02 May 2012 13:15
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Merge during off peak times
>
> But again, with a master/slave setup merging should
> be relatively benign. And at 200M docs, having a M/S
> setup is probably indicated.
>
> Here's a good writeup of mergepolicy
> http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/
>
> If you're indexing and searching on a single machine, merging
> is much less important than how often you commit. If a M/S
> situation, then you're polling interval on the slave is important.
>
> I'd look at commit frequency long before I worried about merging,
> that's usually where people shoot themselves in the foot - by
> committing too often.
>
> Overall, your mergeFactor is probably less important than other
> parts of how you perform indexing/searching, but it does have
> some effect for sure...
>
> Best
> Erick
>
> On Wed, May 2, 2012 at 7:54 AM, Prakashganesh, Prabhu
> <Prabhu.Prakashganesh@dowjones.com> wrote:
>> We have a fairly large scale system - about 200 million docs and fairly high indexing
activity - about 300k docs per day with peak ingestion rates of about 20 docs per sec. I want
to work out what a good mergeFactor setting would be by testing with different mergeFactor
settings. I think the default of 10 might be high, I want to try with 5 and compare. Unless
I know when a merge starts and finishes, it would be quite difficult to work out the impact
of changing mergeFactor. I want to be able to measure how long merges take, run queries during
the merge activity and see what the response times are etc..
>>
>> Thanks
>> Prabhu
>>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>> Sent: 02 May 2012 12:40
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Merge during off peak times
>>
>> Why do you care? Merging is generally a background process, or are
>> you doing heavy indexing? In a master/slave setup,
>> it's usually not really relevant except that (with 3.x), massive merges
>> may temporarily stop indexing. Is that the problem?
>>
>> Look at the merge policys, there are configurations that make
>> this less painful.
>>
>> In trunk, DocumentWriterPerThread makes merges happen in the
>> background, which helps the long-pause-while-indexing problem.
>>
>> Best
>> Erick
>>
>> On Wed, May 2, 2012 at 7:22 AM, Prakashganesh, Prabhu
>> <Prabhu.Prakashganesh@dowjones.com> wrote:
>>> Ok, thanks Otis
>>> Another question on merging
>>> What is the best way to monitor merging?
>>> Is there something in the log file that I can look for?
>>> It seems like I have to monitor the system resources - read/write IOPS etc..
and work out when a merge happened
>>> It would be great if I can do it by looking at log files or in the admin UI.
Do you know if this can be done or if there is some tool for this?
>>>
>>> Thanks
>>> Prabhu
>>>
>>> -----Original Message-----
>>> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
>>> Sent: 01 May 2012 15:12
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Solr Merge during off peak times
>>>
>>> Hi Prabhu,
>>>
>>> I don't think such a merge policy exists, but it would be nice to have this option
and I imagine it wouldn't be hard to write if you really just base the merge or no merge decision
on the time of day (and maybe day of the week).
>>>
>>> Note that this should go into Lucene, not Solr, so if you decide to contribute
your work, please see http://wiki.apache.org/lucene-java/HowToContribute
>>>
>>> Otis
>>> ----
>>> Performance Monitoring for Solr - http://sematext.com/spm
>>>
>>>
>>>
>>>
>>>>________________________________
>>>> From: "Prakashganesh, Prabhu" <Prabhu.Prakashganesh@dowjones.com>
>>>>To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
>>>>Sent: Tuesday, May 1, 2012 8:45 AM
>>>>Subject: Solr Merge during off peak times
>>>>
>>>>Hi,
>>>>  I would like to know if there is a way to configure index merge policy
in solr so that the merging happens during off peak hours. Can you please let me know if such
a merge policy configuration exists?
>>>>
>>>>Thanks
>>>>Prabhu
>>>>
>>>>
>>>>

Mime
View raw message