lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luís Filipe Nassif <lfcnas...@gmail.com>
Subject RE: Optimizing number of segments in lucene index (no writes/deletes, only reads)
Date Wed, 14 Jun 2017 23:39:04 GMT
In the past I have tried IndexSearcher with an ExecutorService to
parallelize searches on multiple segments on a SSD disk. That was with
Lucene 4.9. Unfortunatelly the searches became slower with various number
of threads in the pool,  and much slower with 1 thread. There was some
overhead with that approach? It was improved with later versions?

Thanks,
Luis

Em 14 de jun de 2017 2:21 PM, "Uwe Schindler" <uwe@thetaphi.de> escreveu:

Hi,

This article is still very correct! Use the defaults of TieredMergePolicy,
nothering more to say.

The problems only start once you optimize/forceMerge for the first time and
still update it afterwards. Because then your index is no longer structured
in an optimal way and the huge segment will "collect" deletes and never
gets merged away. So once your manually forceMerged, the Index will behave
bad and you are forced to force merge over and over.

So: Never ever call forceMerge for an index that is still updated,
otherwise you break its structure. If you have a unmodifiable/readonly
index that never ever changes and will be completely rebuilt from scratch
on updates, forceMerge brings some speed improvement, but don't expect too
much. BUT: You also lose the ability to parallelize searches with an
Executor on IndexSearcher!

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Riccardo Tasso [mailto:riccardo.tasso@gmail.com]
> Sent: Wednesday, June 14, 2017 8:34 AM
> To: Lucene Users <java-user@lucene.apache.org>
> Subject: Re: Optimizing number of segments in lucene index (no
> writes/deletes, only reads)
>
> Hi,
>  I have recently read this post, I think it will give you some hint:
>
> http://blog.trifork.com/2011/11/21/simon-says-optimize-is-bad-for-you/
>
> Probably the only advantage of having one huge segment is to use less disk
> space.
>
> Riccardo
>
> 2017-06-14 5:23 GMT+02:00 Tom Hirschfeld <tomhirschfeld@gmail.com>:
>
> > Hello Fellow Lucene-eers,
> >
> > I have a lucene 6.5.1 app primarily indexed/searched via the
> > latLonDocValuesField. The index is built once, and has no
writes/deletes in
> > production. At indexing time, we need to select the number of segments
> we
> > want to generate, and it is unclear to us how many segments we should
> > generate if we are optimizing for query speed. My intuition says that we
> > should only generate 1 segment as we will have no writes/deletes, but I
> > cannot find any hard evidence online to support or refute that
hypothesis.
> > Does anyone here know how many segments we should use? 1 segment? 1
> segment
> > per cpu in prod? 1 segment per core in prod? Something else?
> >
> > Best,
> > Tom Hirschfeld
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message