lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: High cpu and gc time when performing optimization.
Date Wed, 13 Jul 2016 00:14:32 GMT
On 7/12/2016 9:45 AM, Jason wrote:
> I'm using optimize because it's a option for fast search. Our index
> updates one or more weekly. If I don't use optimize, many index files
> should be kept. Any performance issues in that case? And I'm wondering
> relation between index file size and heap size. In case of running as
> master server that only update index, is there any guide for heap size
> include Xmx, NewSize, MaxNewSize, etc.?

In older (2.x and 3.x) versions of Lucene, optimizing an index would
make a huge difference in performance.  In modern versions, the
performance increase from an optimize is much less dramatic.  Lucene
(and by extension, Solr) has gotten very good at dealing with an index
comprised of many segments.  The recommendation for the last few years
has been to AVOID doing an optimize unless it can be done during times
of very low query traffic, when the I/O load will not cause issues.

About the only good reason left for frequent optimizes is when the index
has many updates to existing documents, resulting in a very large
percentage of deleted documents in the index.  In that case, the
optimize will shrink the overall index size, which will make it faster
and make relevancy more accurate.

There is no general information available for setting the heap size. 
There is also no general information available on "acceptable" index
size.  The following wiki page touches a little bit on the heap size topic:

https://wiki.apache.org/solr/SolrPerformanceProblems

The reason that there is no generic information available is covered here:

https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Thanks,
Shawn


Mime
View raw message