lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Some segments in compound format, others not
Date Wed, 24 Aug 2016 09:09:23 GMT
Hi Oliver,

The default behavior of Lucene (well, TieredMergePolicy, the default merge
policy) is to *not* create a compound file for segments that are > 10% of
the total index time at the moment that segment was written.

This way small segments don't use up many file descriptors, while large
segments don't pay the (smallish) index-time cost of creating the compound
file.

See MergePolicy.setNoCFSRatio to change this.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Aug 24, 2016 at 4:10 AM, Oliver Kaleske <Oliver.Kaleske@ptvgroup.com
> wrote:

> Hi,
>
> I noticed on some of my Lucene indexes that they consist of both segments
> in compound format and segments in non-compound format.
> There appears to be a pattern such that smaller segments (in terms of disk
> storage) are in compound format, and larger segments are non-compound.
> However, in one index the largest compound segments are on the order of 10
> megabytes, in another they are all around 35 megabytes, and 50 megabytes
> for yet another, so the "limit" between the two seems to show some sort of
> scaling behavior. In each case, the non-compound segments are roughly a
> factor of 10 larger.
>
> I failed to find any explanation in the documentation.
> https://lucene.apache.org/core/6_1_0/core/org/apache/
> lucene/codecs/lucene60/package-summary.html ("When using the Compound
> File format (default in 1.4 and greater) ...") actually led me to assume
> there would be only either compound or non-compound segments, but not a
> mixture of both.
>
> Is this behavior intentional? Is there an explanation digestible for
> non-experts? Can the behavior be modified somehow?
> I'm asking both out of curiosity and concern as I'm involved in
> integrating Lucene into a larger server system whose many components have
> in the past occasionally hit file handle limits, so compound format ("for
> systems that frequently run out of file handles") seems like a Good Thing
> here.
>
> Thanks in advance and best regards,
> Oliver
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message