lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: improve indexing speed with nomergepolicy
Date Thu, 14 Aug 2014 09:37:51 GMT
I opened https://issues.apache.org/jira/browse/LUCENE-5883 to handle that.

Shai


On Thu, Aug 7, 2014 at 6:42 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> This is a good idea, because sometimes it's nice to change the MergePolicy
> on the fly without reopening! One example is
> https://issues.apache.org/jira/browse/LUCENE-5526
> In my case, I would like to open an IndexWriter, set its merge policy to
> IndexUpdaterMergePolicy, force a merge to upgrade all segments and then
> proceed with normal indexing and other stuff. Currently you have to close
> IW - this is bad in multithreaded environments: If you start an Index
> Upgrade after installing a new version of your favourite Solr/ES/...
> server, but need to index documents in parallel (real time system) - so
> with little downtime.
> The proposal in the above issue is to allow to pass a MergePolicy to
> forceMerge().
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Shai Erera [mailto:serera@gmail.com]
> > Sent: Thursday, August 07, 2014 4:11 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: improve indexing speed with nomergepolicy
> >
> > Yes, currently an MP isn't a "live" setting on IndexWriter, meaning you
> pass it
> > at construction time and don't change it afterwards. I wonder if after
> > LUCENE-5711 we can move MergePolicy to LiveIndexWriterConfig and fix
> > IndexWriter to not hold on to it, but rather pull it from the config.
> >
> > Not sure what others think about it.
> >
> > Shai
> >
> >
> > On Thu, Aug 7, 2014 at 5:05 PM, Jon Stewart
> > <jon@lightboxtechnologies.com>
> > wrote:
> >
> > > Related, how does one change the MergePolicy on an IndexWriter (e.g.,
> > > use NoMergePolicy during batch indexing, then change to something
> > > better once finished with batch)? It looks like the MergePolicy is set
> > > through IndexWriterConfig but I don't see a way to update an IWC on an
> > > IW.
> > >
> > > Thanks,
> > >
> > > Jon
> > >
> > >
> > > On Thu, Aug 7, 2014 at 7:37 AM, Shai Erera <serera@gmail.com> wrote:
> > > > Using NoMergePolicy for online indexes is usually not recommended.
> > > > You
> > > want
> > > > to use NoMP in case where you build an index in a batch job, then in
> > > > the end before the index is "published" you run a forceMerge or
> > > > maybeMerge (with a real MergePolicy).
> > > >
> > > > For online indexes, i.e. indexes that are being searched while they
> > > > are updated, if you use NoMP you will accumulate many segments in the
> > index.
> > > > This means higher resources consumption overall: file handles, RAM,
> > > > potentially disk space, and usually results in slower searches.
> > > >
> > > > You may want to tweak the default MP's settings though, to not kick
> > > > off a merge unless there are a large number of segments in the
> > > > index. E.g. the default MP merges segments when there are 10 at the
> > same level (i.e.
> > > > roughly the same size). You can increase that.
> > > >
> > > > Also, do you use NRTCachingDirectory? It's usually recommended for
> > > > NRT, even with default MP, since the tiny segments are merged
> > > > in-memory, and your NRT reopens don't result in flushing new segments
> > to disk.
> > > >
> > > > Shai
> > > >
> > > >
> > > > On Thu, Aug 7, 2014 at 1:14 PM, Sascha Janz <Sascha.Janz@gmx.net>
> > wrote:
> > > >
> > > >> hi,
> > > >>
> > > >> i try to speed up our indexing process. we use SeacherManager with
> > > >> applydeletes to get near real time Reader.
> > > >>
> > > >> we have not really "much" incoming documents, but the documents
> > > >> must be updated from time to time and the amount of documents to be
> > > >> updated
> > > could
> > > >> be quite large.
> > > >>
> > > >> i tried some tests with NoMergePolicy and the indexing process was
> > > >> 25 % faster.
> > > >>
> > > >> so i think of a change in our code, to use NoMergePolicy for a
> > > >> specific time interval, when users are active and do a
> > > >> forceMerge(20) every
> > > night,
> > > >> which last about 2 - 5 minutes.
> > > >>
> > > >> is this a good idea? or will i perhaps get into trouble?
> > > >>
> > > >> Sascha
> > > >>
> > > >>
> > > >> -------------------------------------------------------------------
> > > >> -- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >>
> > > >>
> > >
> > >
> > >
> > > --
> > > Jon Stewart, Principal
> > > (646) 719-0317 | jon@lightboxtechnologies.com | Arlington, VA
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message