lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize
Date Fri, 03 Mar 2017 01:42:21 GMT
What do you have for merge configuration in solrconfig.xml? You should
be able to tune it to - approximately - whatever you want without
doing the grand optimize:
https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig#IndexConfiginSolrConfig-MergingIndexSegments

Regards,
   Alex.
----
http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 2 March 2017 at 16:37, Caruana, Matthew <mcaruana@icij.org> wrote:
> Yes, we already do it outside Solr. See https://github.com/ICIJ/extract which we developed
for this purpose. My guess is that the documents are very large, as you say.
>
> Optimising was always an attempt to bring down the number of segments from 60+. Not sure
how else to do that.
>
>> On 2 Mar 2017, at 7:42 pm, Michael Joyner <michael@newsrx.com> wrote:
>>
>> You can solve the disk space and time issues by specifying multiple segments to optimize
down to instead of a single segment.
>>
>> When we reindex we have to optimize or we end up with hundreds of segments and very
horrible performance.
>>
>> We optimize down to like 16 segments or so and it doesn't do the 3x disk space thing
and usually runs in a decent amount of time. (we have >50 million articles in one of our
solr indexes).
>>
>>
>>> On 03/02/2017 10:20 AM, David Hastings wrote:
>>> Agreed, and since it takes three times the space is part of the reason it
>>> takes so long, so that 190gb index ends up writing another 380 gb until it
>>> compresses down and deletes the two left over files.  its a pretty hefty
>>> operation
>>>
>>> On Thu, Mar 2, 2017 at 10:13 AM, Alexandre Rafalovitch <arafalov@gmail.com>
>>> wrote:
>>>
>>>> Optimize operation is no longer recommended for Solr, as the
>>>> background merges got a lot smarter.
>>>>
>>>> It is an extremely expensive operation that can require up to 3-times
>>>> amount of disk during the processing.
>>>>
>>>> This is not to say yours is a valid question, which I am leaving to
>>>> others to respond.
>>>>
>>>> Regards,
>>>>    Alex.
>>>> ----
>>>> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>>>>
>>>>
>>>>> On 2 March 2017 at 10:04, Caruana, Matthew <mcaruana@icij.org>
wrote:
>>>>> I’m currently performing an optimise operation on a ~190GB index with
>>>> about 4 million documents. The process has been running for hours.
>>>>> This is surprising, because the machine is an EC2 r4.xlarge with four
>>>> cores and 30GB of RAM, 24GB of which is allocated to the JVM.
>>>>> The load average has been steady at about 1.3. Memory usage is 25% or
>>>> less the whole time. iostat reports ~6% util.
>>>>> What gives?
>>>>>
>>>>> Running Solr 6.4.1.
>>

Mime
View raw message