lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: improve indexing speed with nomergepolicy
Date Thu, 07 Aug 2014 14:11:24 GMT
Yes, currently an MP isn't a "live" setting on IndexWriter, meaning you
pass it at construction time and don't change it afterwards. I wonder if
after LUCENE-5711 we can move MergePolicy to LiveIndexWriterConfig and fix
IndexWriter to not hold on to it, but rather pull it from the config.

Not sure what others think about it.

Shai


On Thu, Aug 7, 2014 at 5:05 PM, Jon Stewart <jon@lightboxtechnologies.com>
wrote:

> Related, how does one change the MergePolicy on an IndexWriter (e.g.,
> use NoMergePolicy during batch indexing, then change to something
> better once finished with batch)? It looks like the MergePolicy is set
> through IndexWriterConfig but I don't see a way to update an IWC on an
> IW.
>
> Thanks,
>
> Jon
>
>
> On Thu, Aug 7, 2014 at 7:37 AM, Shai Erera <serera@gmail.com> wrote:
> > Using NoMergePolicy for online indexes is usually not recommended. You
> want
> > to use NoMP in case where you build an index in a batch job, then in the
> > end before the index is "published" you run a forceMerge or maybeMerge
> > (with a real MergePolicy).
> >
> > For online indexes, i.e. indexes that are being searched while they are
> > updated, if you use NoMP you will accumulate many segments in the index.
> > This means higher resources consumption overall: file handles, RAM,
> > potentially disk space, and usually results in slower searches.
> >
> > You may want to tweak the default MP's settings though, to not kick off a
> > merge unless there are a large number of segments in the index. E.g. the
> > default MP merges segments when there are 10 at the same level (i.e.
> > roughly the same size). You can increase that.
> >
> > Also, do you use NRTCachingDirectory? It's usually recommended for NRT,
> > even with default MP, since the tiny segments are merged in-memory, and
> > your NRT reopens don't result in flushing new segments to disk.
> >
> > Shai
> >
> >
> > On Thu, Aug 7, 2014 at 1:14 PM, Sascha Janz <Sascha.Janz@gmx.net> wrote:
> >
> >> hi,
> >>
> >> i try to speed up our indexing process. we use SeacherManager with
> >> applydeletes to get near real time Reader.
> >>
> >> we have not really "much" incoming documents, but the documents must be
> >> updated from time to time and the amount of documents to be updated
> could
> >> be quite large.
> >>
> >> i tried some tests with NoMergePolicy and the indexing process was 25 %
> >> faster.
> >>
> >> so i think of a change in our code, to use NoMergePolicy for a specific
> >> time interval, when users are active and do a forceMerge(20) every
> night,
> >> which last about 2 - 5 minutes.
> >>
> >> is this a good idea? or will i perhaps get into trouble?
> >>
> >> Sascha
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
>
>
> --
> Jon Stewart, Principal
> (646) 719-0317 | jon@lightboxtechnologies.com | Arlington, VA
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message