lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Upayavira ...@odoko.co.uk>
Subject Re: Index optimize runs in background.
Date Wed, 27 May 2015 11:00:14 GMT
In this case, optimising makes sense, once the index is generated, you
are not updating It.

Upayavira

On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
> Our index has almost 100M documents running on SolrCloud of 5 shards and
> each shard has an index size of about 170+GB (for the record, we are not
> using stored fields - our documents are pretty large). We perform a full
> indexing every weekend and during the week there are no updates made to
> the
> index. Most of the queries that we run are pretty complex with hundreds
> of
> terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
> and take many minutes to execute. A difference of 10-20% is also a big
> advantage for us.
> 
> We have been optimizing the index after indexing for years and it has
> worked well for us. Every once in a while, we upgrade Solr to the latest
> version and try without optimizing so that we can save the many hours it
> take to optimize such a huge index, but find optimized index work well
> for
> us.
> 
> Erick I was indexing today the documents and saw the optimize happening
> in
> background.
> 
> On Tue, May 26, 2015 at 9:12 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
> 
> > No results yet. I finished the test harness last night (not really a
> > unit test, a stand-alone program that endlessly adds stuff and tests
> > that every commit returns the correct number of docs).
> >
> > 8,000 cycles later there aren't any problems reported.
> >
> > Siiigggggh.
> >
> >
> > On Tue, May 26, 2015 at 1:51 AM, Modassar Ather <modather1981@gmail.com>
> > wrote:
> > > Hi,
> > >
> > > Erick you mentioned about a unit test to test the optimize running in
> > > background. Kindly share your findings if any.
> > >
> > > Thanks,
> > > Modassar
> > >
> > > On Mon, May 25, 2015 at 11:47 AM, Modassar Ather <modather1981@gmail.com
> > >
> > > wrote:
> > >
> > >> Thanks everybody for your replies.
> > >>
> > >> I have noticed the optimization running in background every time I
> > >> indexed. This is 5 node cluster with solr-5.1.0 and uses the
> > >> CloudSolrClient. Kindly share your findings on this issue.
> > >>
> > >> Our index has almost 100M documents running on SolrCloud. We have been
> > >> optimizing the index after indexing for years and it has worked well for
> > >> us.
> > >>
> > >> Thanks,
> > >> Modassar
> > >>
> > >> On Fri, May 22, 2015 at 11:55 PM, Erick Erickson <
> > erickerickson@gmail.com>
> > >> wrote:
> > >>
> > >>> Actually, I've recently seen very similar behavior in Solr 4.10.3,
but
> > >>> involving hard commits openSearcher=true, see:
> > >>> https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't
> > >>> reproduce this at will, siigggghhhh.
> > >>>
> > >>> A unit test should be very simple to write though, maybe I can get
to
> > it
> > >>> today.
> > >>>
> > >>> Erick
> > >>>
> > >>>
> > >>>
> > >>> On Fri, May 22, 2015 at 8:27 AM, Upayavira <uv@odoko.co.uk> wrote:
> > >>> >
> > >>> >
> > >>> > On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
> > >>> >> On 5/21/2015 6:21 AM, Modassar Ather wrote:
> > >>> >> > I am using Solr-5.1.0. I have an indexer class which
invokes
> > >>> >> > cloudSolrClient.optimize(true, true, 1). My indexer exits
after
> > the
> > >>> >> > invocation of optimize and the optimization keeps on
running in
> > the
> > >>> >> > background.
> > >>> >> > Kindly let me know if it is per design and how can I
make my
> > indexer
> > >>> to
> > >>> >> > wait until the optimization is over. Is there a
> > >>> configuration/parameter I
> > >>> >> > need to set for the same.
> > >>> >> >
> > >>> >> > Please note that the same indexer with
> > >>> cloudSolrServer.optimize(true, true,
> > >>> >> > 1) on Solr-4.10 used to wait till the optimize was over
before
> > >>> exiting.
> > >>> >>
> > >>> >> This is very odd, because I could not get HttpSolrServer to
> > optimize in
> > >>> >> the background, even when that was what I wanted.
> > >>> >>
> > >>> >> I wondered if maybe the Cloud object behaves differently with
> > regard to
> > >>> >> blocking until an optimize is finished ... except that there
is no
> > code
> > >>> >> for optimizing in CloudSolrClient at all ... so I don't know
where
> > the
> > >>> >> different behavior would actually be happening.
> > >>> >
> > >>> > A more important question is, why are you optimising? Generally
it
> > isn't
> > >>> > recommended anymore as it reduces the natural distribution of
> > documents
> > >>> > amongst segments and makes future merges more costly.
> > >>> >
> > >>> > Upayavira
> > >>>
> > >>
> > >>
> >

Mime
View raw message