lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Kaleske <>
Subject Re: Some segments in compound format, others not
Date Wed, 24 Aug 2016 09:37:11 GMT
Hi Mike,

thanks a lot for your explanation. I don't know by now whether I'll ever run into actual trouble
with file handles, but just in case, it's good to know that setNoCFSRatio still allows for
some tuning.

Thanks again.


Von: Michael McCandless [] 
Gesendet: Mittwoch, 24. August 2016 11:09
An: Lucene Users <>; Oliver Kaleske <>
Betreff: Re: Some segments in compound format, others not

Hi Oliver,

The default behavior of Lucene (well, TieredMergePolicy, the default merge policy) is to *not*
create a compound file for segments that are > 10% of the total index time at the moment
that segment was written.

This way small segments don't use up many file descriptors, while large segments don't pay
the (smallish) index-time cost of creating the compound file.

See MergePolicy.setNoCFSRatio to change this.

Mike McCandless

On Wed, Aug 24, 2016 at 4:10 AM, Oliver Kaleske <> wrote:

I noticed on some of my Lucene indexes that they consist of both segments in compound format
and segments in non-compound format.
There appears to be a pattern such that smaller segments (in terms of disk storage) are in
compound format, and larger segments are non-compound.
However, in one index the largest compound segments are on the order of 10 megabytes, in another
they are all around 35 megabytes, and 50 megabytes for yet another, so the "limit" between
the two seems to show some sort of scaling behavior. In each case, the non-compound segments
are roughly a factor of 10 larger.

I failed to find any explanation in the documentation.
("When using the Compound File format (default in 1.4 and greater) ...") actually led me to
assume there would be only either compound or non-compound segments, but not a mixture of

Is this behavior intentional? Is there an explanation digestible for non-experts? Can the
behavior be modified somehow?
I'm asking both out of curiosity and concern as I'm involved in integrating Lucene into a
larger server system whose many components have in the past occasionally hit file handle limits,
so compound format ("for systems that frequently run out of file handles") seems like a Good
Thing here.

Thanks in advance and best regards,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message