lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: How to avoid huge index files
Date Thu, 10 Sep 2009 09:06:33 GMT
First, you need to limit the size of segments initially created by
IndexWriter due to newly added documents.  Probably the simplest way
is to call IndexWriter.commit() frequently enough.  You might want to
use IndexWriter.ramSizeInBytes() to gauge how much RAM is currently
consumed by IndexWriter's buffer to determine when to commit.  But it
won't be an exact science, ie, the segment size will be different from
the RAM buffer size.  So, experiment w/ it...

Second, you need to prevent merging from creating a segment that's too
large.  For this I would use the setMaxMergeMB method of the
LogByteSizeMergePolicy (which is IndexWriter's default merge policy).
But note that this max size applies to the *input* segments, so you'd
roughly want that to be 1.0 MB (your 10.0 MB divided by the merge
factor = 10), but probably make it smaller to be sure things stay
small enough.

Note that with this approach, if your index is large enough, you'll
wind up with many segments and search performance will suffer when
compared to an index that doesn't have this max 10.0 MB file size


On Thu, Sep 10, 2009 at 2:32 AM, Dvora <> wrote:
> Hello again,
> Can someone please comment on that, whether what I'm looking is possible or
> not?
> Dvora wrote:
>> Hello,
>> I'm using Lucene2.4. I'm developing a web application that using Lucene
>> (via compass) to do the searches.
>> I'm intending to deploy the application in Google App Engine
>> (, which limits files length to be
>> smaller than 10MB. I've read about the various policies supported by
>> Lucene to limit the file sizes, but on matter which policy I used and
>> which parameters, the index files still grew to be lot more the 10MB.
>> Looking at the code, I've managed to limit the cfs files (predicting the
>> file size in CompoundFileWriter before closing the file) - I guess that
>> will degrade performance, but it's OK for now. But now the FDT files are
>> becoming huge (about 60MB) and I cant identifiy a way to limit those
>> files.
>> Is there some built-in and correct way to limit these files length? If no,
>> can someone direct me please how should I tweak the source code to achieve
>> that?
>> Thanks for any help.
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message