lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Modassar Ather <modather1...@gmail.com>
Subject Re: Parallel optimize of index on SolrCloud.
Date Wed, 09 Jul 2014 07:59:19 GMT
Hi All,

Thanks for your kind suggestions and inputs.

We have been going the optimize way and it has helped. There have been
testing and benchmarking already done around memory and performance.
So while optimizing we see a scope of improvement on it by doing it
parallel so kindly suggest in what way it can be achieved.

Thanks,
Modassar


On Wed, Jul 9, 2014 at 11:48 AM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> Hi Walter,
>
> I wonder why you think SolrCloud isn't necessary if you're indexing once
> per week. Isn't the automatic failover and auto-sharding still useful? One
> can also do custom sharding with SolrCloud if necessary.
>
>
> On Wed, Jul 9, 2014 at 11:38 AM, Walter Underwood <wunder@wunderwood.org>
> wrote:
>
> > More memory or faster disks will make a much bigger improvement than a
> > forced merge.
> >
> > What are you measuring? If it is average query time, that is not a good
> > measure. Look at 90th or 95th percentile. Test with queries from logs.
> >
> > No user can see a 10% or 20% difference. If your managers are watching
> > that, they are watching the wrong thing.
> >
> > If you are indexing once per week, you don't really need the complexity
> of
> > Solr Cloud. You can do manual sharding.
> >
> > wunder
> >
> > On Jul 8, 2014, at 10:55 PM, Modassar Ather <modather1981@gmail.com>
> > wrote:
> >
> > > Our index has almost 100M documents running on SolrCloud of 3 shards
> and
> > > each shard has an index size of about 700GB (for the record, we are not
> > > using stored fields - our documents are pretty large). We perform a
> full
> > > indexing every weekend and during the week there are no updates made to
> > the
> > > index. Most of the queries that we run are pretty complex with hundreds
> > of
> > > terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
> etc.
> > > and take many minutes to execute. A difference of 10-20% is also a big
> > > advantage for us.
> > >
> > > We have been optimizing the index after indexing for years and it has
> > > worked well for us. Every once in a while, we upgrade Solr to the
> latest
> > > version and try without optimizing so that we can save the many hours
> it
> > > take to optimize such a huge index, but it does not work well.
> > >
> > > Kindly provide your suggestion.
> > >
> > > Thanks,
> > > Modassar
> > >
> > >
> > > On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood <
> wunder@wunderwood.org
> > >
> > > wrote:
> > >
> > >> I seriously doubt that you are required to force merge.
> > >>
> > >> How much improvement? And is the big performance cost also OK?
> > >>
> > >> I have worked on search engines that do automatic merges and offer
> > forced
> > >> merges for over fifteen years. For all that time, forced merges have
> > >> usually caused problems.
> > >>
> > >> Stop doing forced merges.
> > >>
> > >> wunder
> > >>
> > >> On Jul 8, 2014, at 10:09 PM, Modassar Ather <modather1981@gmail.com>
> > >> wrote:
> > >>
> > >>> Thanks Walter for your inputs.
> > >>>
> > >>> Our use case and performance benchmark requires us to invoke
> optimize.
> > >>>
> > >>> Here we see a chance of improvement in performance of optimize() if
> > >> invoked
> > >>> in parallel.
> > >>> I found that if* distrib=false *is used, the optimization will happen
> > in
> > >>> parallel.
> > >>>
> > >>> But I could not find a way to set it using
> > >> HttpSolrServer/CloudSolrServer.
> > >>> Also with the parameter setting as given in my mail above does not
> > seems
> > >> to
> > >>> work.
> > >>>
> > >>> Please let me know in what ways I can achieve the parallel optimize
> on
> > >>> SolrCloud.
> > >>>
> > >>> Thanks,
> > >>> Modassar
> > >>>
> > >>> On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood <
> > wunder@wunderwood.org>
> > >>> wrote:
> > >>>
> > >>>> You probably do not need to force merge (mistakenly called
> "optimize")
> > >>>> your index.
> > >>>>
> > >>>> Solr does automatic merges, which work just fine.
> > >>>>
> > >>>> There are only a few situations where a forced merge is even a
good
> > >> idea.
> > >>>> The most common one is a replicated (non-cloud) setup with a full
> > >> reindex
> > >>>> every night.
> > >>>>
> > >>>> If you need Solr Cloud, I cannot think of a situation where you
> would
> > >> want
> > >>>> a forced merge.
> > >>>>
> > >>>> wunder
> > >>>>
> > >>>> On Jul 8, 2014, at 2:01 AM, Modassar Ather <modather1981@gmail.com>
> > >> wrote:
> > >>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> Need to optimize index created using CloudSolrServer APIs under
> > >> SolrCloud
> > >>>>> setup of 3 instances on separate machines. Currently it optimizes
> > >>>>> sequentially if I invoke cloudSolrServer.optimize().
> > >>>>>
> > >>>>> To make it parallel I tried making three separate HttpSolrServer
> > >>>> instances
> > >>>>> and invoked httpSolrServer.opimize() on them parallely but
still it
> > >> seems
> > >>>>> to be doing optimization sequentially.
> > >>>>>
> > >>>>> I tried invoking optimize directly using HttpPost with following
> url
> > >> and
> > >>>>> parameters but still it seems to be sequential.
> > >>>>> *URL* : http://host:port/solr/collection/update
> > >>>>>
> > >>>>> *Parameters*:
> > >>>>> params.add(new BasicNameValuePair("optimize", "true"));
> > >>>>> params.add(new BasicNameValuePair("maxSegments", "1"));
> > >>>>> params.add(new BasicNameValuePair("waitFlush", "true"));
> > >>>>> params.add(new BasicNameValuePair("distrib", "false"));
> > >>>>>
> > >>>>> Kindly provide your suggestion and help.
> > >>>>>
> > >>>>> Regards,
> > >>>>> Modassar
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>
> > >> --
> > >> Walter Underwood
> > >> wunder@wunderwood.org
> > >>
> > >>
> > >>
> > >>
> >
> > --
> > Walter Underwood
> > wunder@wunderwood.org
> >
> >
> >
> >
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message