lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Joyner <mich...@newsrx.com>
Subject Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize
Date Thu, 02 Mar 2017 18:42:10 GMT
You can solve the disk space and time issues by specifying multiple 
segments to optimize down to instead of a single segment.

When we reindex we have to optimize or we end up with hundreds of 
segments and very horrible performance.

We optimize down to like 16 segments or so and it doesn't do the 3x disk 
space thing and usually runs in a decent amount of time. (we have >50 
million articles in one of our solr indexes).


On 03/02/2017 10:20 AM, David Hastings wrote:
> Agreed, and since it takes three times the space is part of the reason it
> takes so long, so that 190gb index ends up writing another 380 gb until it
> compresses down and deletes the two left over files.  its a pretty hefty
> operation
>
> On Thu, Mar 2, 2017 at 10:13 AM, Alexandre Rafalovitch <arafalov@gmail.com>
> wrote:
>
>> Optimize operation is no longer recommended for Solr, as the
>> background merges got a lot smarter.
>>
>> It is an extremely expensive operation that can require up to 3-times
>> amount of disk during the processing.
>>
>> This is not to say yours is a valid question, which I am leaving to
>> others to respond.
>>
>> Regards,
>>     Alex.
>> ----
>> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>>
>>
>> On 2 March 2017 at 10:04, Caruana, Matthew <mcaruana@icij.org> wrote:
>>> I’m currently performing an optimise operation on a ~190GB index with
>> about 4 million documents. The process has been running for hours.
>>> This is surprising, because the machine is an EC2 r4.xlarge with four
>> cores and 30GB of RAM, 24GB of which is allocated to the JVM.
>>> The load average has been steady at about 1.3. Memory usage is 25% or
>> less the whole time. iostat reports ~6% util.
>>> What gives?
>>>
>>> Running Solr 6.4.1.


Mime
View raw message