lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Optimizing number of segments in lucene index (no writes/deletes, only reads)
Date Wed, 14 Jun 2017 17:20:56 GMT
Hi,

This article is still very correct! Use the defaults of TieredMergePolicy, nothering more
to say.

The problems only start once you optimize/forceMerge for the first time and still update it
afterwards. Because then your index is no longer structured in an optimal way and the huge
segment will "collect" deletes and never gets merged away. So once your manually forceMerged,
the Index will behave bad and you are forced to force merge over and over.

So: Never ever call forceMerge for an index that is still updated, otherwise you break its
structure. If you have a unmodifiable/readonly index that never ever changes and will be completely
rebuilt from scratch on updates, forceMerge brings some speed improvement, but don't expect
too much. BUT: You also lose the ability to parallelize searches with an Executor on IndexSearcher!

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Riccardo Tasso [mailto:riccardo.tasso@gmail.com]
> Sent: Wednesday, June 14, 2017 8:34 AM
> To: Lucene Users <java-user@lucene.apache.org>
> Subject: Re: Optimizing number of segments in lucene index (no
> writes/deletes, only reads)
> 
> Hi,
>  I have recently read this post, I think it will give you some hint:
> 
> http://blog.trifork.com/2011/11/21/simon-says-optimize-is-bad-for-you/
> 
> Probably the only advantage of having one huge segment is to use less disk
> space.
> 
> Riccardo
> 
> 2017-06-14 5:23 GMT+02:00 Tom Hirschfeld <tomhirschfeld@gmail.com>:
> 
> > Hello Fellow Lucene-eers,
> >
> > I have a lucene 6.5.1 app primarily indexed/searched via the
> > latLonDocValuesField. The index is built once, and has no writes/deletes in
> > production. At indexing time, we need to select the number of segments
> we
> > want to generate, and it is unclear to us how many segments we should
> > generate if we are optimizing for query speed. My intuition says that we
> > should only generate 1 segment as we will have no writes/deletes, but I
> > cannot find any hard evidence online to support or refute that hypothesis.
> > Does anyone here know how many segments we should use? 1 segment? 1
> segment
> > per cpu in prod? 1 segment per core in prod? Something else?
> >
> > Best,
> > Tom Hirschfeld
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message