lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Parallel optimize of index on SolrCloud.
Date Wed, 09 Jul 2014 06:08:08 GMT
More memory or faster disks will make a much bigger improvement than a forced merge.

What are you measuring? If it is average query time, that is not a good measure. Look at 90th
or 95th percentile. Test with queries from logs.

No user can see a 10% or 20% difference. If your managers are watching that, they are watching
the wrong thing.

If you are indexing once per week, you don't really need the complexity of Solr Cloud. You
can do manual sharding.

wunder

On Jul 8, 2014, at 10:55 PM, Modassar Ather <modather1981@gmail.com> wrote:

> Our index has almost 100M documents running on SolrCloud of 3 shards and
> each shard has an index size of about 700GB (for the record, we are not
> using stored fields - our documents are pretty large). We perform a full
> indexing every weekend and during the week there are no updates made to the
> index. Most of the queries that we run are pretty complex with hundreds of
> terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
> and take many minutes to execute. A difference of 10-20% is also a big
> advantage for us.
> 
> We have been optimizing the index after indexing for years and it has
> worked well for us. Every once in a while, we upgrade Solr to the latest
> version and try without optimizing so that we can save the many hours it
> take to optimize such a huge index, but it does not work well.
> 
> Kindly provide your suggestion.
> 
> Thanks,
> Modassar
> 
> 
> On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood <wunder@wunderwood.org>
> wrote:
> 
>> I seriously doubt that you are required to force merge.
>> 
>> How much improvement? And is the big performance cost also OK?
>> 
>> I have worked on search engines that do automatic merges and offer forced
>> merges for over fifteen years. For all that time, forced merges have
>> usually caused problems.
>> 
>> Stop doing forced merges.
>> 
>> wunder
>> 
>> On Jul 8, 2014, at 10:09 PM, Modassar Ather <modather1981@gmail.com>
>> wrote:
>> 
>>> Thanks Walter for your inputs.
>>> 
>>> Our use case and performance benchmark requires us to invoke optimize.
>>> 
>>> Here we see a chance of improvement in performance of optimize() if
>> invoked
>>> in parallel.
>>> I found that if* distrib=false *is used, the optimization will happen in
>>> parallel.
>>> 
>>> But I could not find a way to set it using
>> HttpSolrServer/CloudSolrServer.
>>> Also with the parameter setting as given in my mail above does not seems
>> to
>>> work.
>>> 
>>> Please let me know in what ways I can achieve the parallel optimize on
>>> SolrCloud.
>>> 
>>> Thanks,
>>> Modassar
>>> 
>>> On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood <wunder@wunderwood.org>
>>> wrote:
>>> 
>>>> You probably do not need to force merge (mistakenly called "optimize")
>>>> your index.
>>>> 
>>>> Solr does automatic merges, which work just fine.
>>>> 
>>>> There are only a few situations where a forced merge is even a good
>> idea.
>>>> The most common one is a replicated (non-cloud) setup with a full
>> reindex
>>>> every night.
>>>> 
>>>> If you need Solr Cloud, I cannot think of a situation where you would
>> want
>>>> a forced merge.
>>>> 
>>>> wunder
>>>> 
>>>> On Jul 8, 2014, at 2:01 AM, Modassar Ather <modather1981@gmail.com>
>> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Need to optimize index created using CloudSolrServer APIs under
>> SolrCloud
>>>>> setup of 3 instances on separate machines. Currently it optimizes
>>>>> sequentially if I invoke cloudSolrServer.optimize().
>>>>> 
>>>>> To make it parallel I tried making three separate HttpSolrServer
>>>> instances
>>>>> and invoked httpSolrServer.opimize() on them parallely but still it
>> seems
>>>>> to be doing optimization sequentially.
>>>>> 
>>>>> I tried invoking optimize directly using HttpPost with following url
>> and
>>>>> parameters but still it seems to be sequential.
>>>>> *URL* : http://host:port/solr/collection/update
>>>>> 
>>>>> *Parameters*:
>>>>> params.add(new BasicNameValuePair("optimize", "true"));
>>>>> params.add(new BasicNameValuePair("maxSegments", "1"));
>>>>> params.add(new BasicNameValuePair("waitFlush", "true"));
>>>>> params.add(new BasicNameValuePair("distrib", "false"));
>>>>> 
>>>>> Kindly provide your suggestion and help.
>>>>> 
>>>>> Regards,
>>>>> Modassar
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> --
>> Walter Underwood
>> wunder@wunderwood.org
>> 
>> 
>> 
>> 

--
Walter Underwood
wunder@wunderwood.org




Mime
View raw message