lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: improve indexing speed with nomergepolicy
Date Thu, 07 Aug 2014 15:42:37 GMT
This is a good idea, because sometimes it's nice to change the MergePolicy on the fly without
reopening! One example is https://issues.apache.org/jira/browse/LUCENE-5526
In my case, I would like to open an IndexWriter, set its merge policy to IndexUpdaterMergePolicy,
force a merge to upgrade all segments and then proceed with normal indexing and other stuff.
Currently you have to close IW - this is bad in multithreaded environments: If you start an
Index Upgrade after installing a new version of your favourite Solr/ES/... server, but need
to index documents in parallel (real time system) - so with little downtime.
The proposal in the above issue is to allow to pass a MergePolicy to forceMerge().

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Shai Erera [mailto:serera@gmail.com]
> Sent: Thursday, August 07, 2014 4:11 PM
> To: java-user@lucene.apache.org
> Subject: Re: improve indexing speed with nomergepolicy
> 
> Yes, currently an MP isn't a "live" setting on IndexWriter, meaning you pass it
> at construction time and don't change it afterwards. I wonder if after
> LUCENE-5711 we can move MergePolicy to LiveIndexWriterConfig and fix
> IndexWriter to not hold on to it, but rather pull it from the config.
> 
> Not sure what others think about it.
> 
> Shai
> 
> 
> On Thu, Aug 7, 2014 at 5:05 PM, Jon Stewart
> <jon@lightboxtechnologies.com>
> wrote:
> 
> > Related, how does one change the MergePolicy on an IndexWriter (e.g.,
> > use NoMergePolicy during batch indexing, then change to something
> > better once finished with batch)? It looks like the MergePolicy is set
> > through IndexWriterConfig but I don't see a way to update an IWC on an
> > IW.
> >
> > Thanks,
> >
> > Jon
> >
> >
> > On Thu, Aug 7, 2014 at 7:37 AM, Shai Erera <serera@gmail.com> wrote:
> > > Using NoMergePolicy for online indexes is usually not recommended.
> > > You
> > want
> > > to use NoMP in case where you build an index in a batch job, then in
> > > the end before the index is "published" you run a forceMerge or
> > > maybeMerge (with a real MergePolicy).
> > >
> > > For online indexes, i.e. indexes that are being searched while they
> > > are updated, if you use NoMP you will accumulate many segments in the
> index.
> > > This means higher resources consumption overall: file handles, RAM,
> > > potentially disk space, and usually results in slower searches.
> > >
> > > You may want to tweak the default MP's settings though, to not kick
> > > off a merge unless there are a large number of segments in the
> > > index. E.g. the default MP merges segments when there are 10 at the
> same level (i.e.
> > > roughly the same size). You can increase that.
> > >
> > > Also, do you use NRTCachingDirectory? It's usually recommended for
> > > NRT, even with default MP, since the tiny segments are merged
> > > in-memory, and your NRT reopens don't result in flushing new segments
> to disk.
> > >
> > > Shai
> > >
> > >
> > > On Thu, Aug 7, 2014 at 1:14 PM, Sascha Janz <Sascha.Janz@gmx.net>
> wrote:
> > >
> > >> hi,
> > >>
> > >> i try to speed up our indexing process. we use SeacherManager with
> > >> applydeletes to get near real time Reader.
> > >>
> > >> we have not really "much" incoming documents, but the documents
> > >> must be updated from time to time and the amount of documents to be
> > >> updated
> > could
> > >> be quite large.
> > >>
> > >> i tried some tests with NoMergePolicy and the indexing process was
> > >> 25 % faster.
> > >>
> > >> so i think of a change in our code, to use NoMergePolicy for a
> > >> specific time interval, when users are active and do a
> > >> forceMerge(20) every
> > night,
> > >> which last about 2 - 5 minutes.
> > >>
> > >> is this a good idea? or will i perhaps get into trouble?
> > >>
> > >> Sascha
> > >>
> > >>
> > >> -------------------------------------------------------------------
> > >> -- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >>
> >
> >
> >
> > --
> > Jon Stewart, Principal
> > (646) 719-0317 | jon@lightboxtechnologies.com | Arlington, VA
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message