lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <>
Subject Re: OutOfMemoryError
Date Thu, 29 Nov 2001 12:10:55 GMT
Doug sent the message below to the list on 3-Nov in response to
a query about file size limits.  There may have been more
related stuff on the thread as well.


>   *** Anyway, is there anyway to control how big the indexes 
> grow ? ****

The easiset thing is to set IndexWriter.maxMergeDocs. Since you hit 2GB at
8M docs, set this to 7M.  That will keep Lucene from trying to merge an
index that won't fit in your filesystem.  (It will actually effectively
round this down to the next lower power of Index.mergeFactor.  So with the
default mergeFactor=10, maxMergeDocs=7M will generate a series of 1M
document indexes, since merging 10 of these would exceed the max.)

Slightly more complex: you could further minimize the number of segments,
if, when you've added seven million documents, optimize the index and start
a new index.  Then use MultiSearcher to search.

Even more complex and optimal: write a version of FSDirectory that, when a
file exceeds 2GB, creates a subdirectory and represents the file as a series
of files.  (I've done this before, and found that, on at least the version
of Solaris that I was using, the files had to be a few 100k less than 2GB
for programs like 'cp' and 'ftp' to operate correctly on them.)


Chantal Ackermann wrote:
> hi Ian, hi Winton, hi all,
> sorry I meant heap size of 100Mb. I'm  starting java with -Xmx100m. I'm not
> setting -Xms.
> For what I know now, I had a bug in my own code. still I don't understand
> where these OutOfMemoryErrors came from. I will try to index again in one
> thread without RAMDirectory just to check if the program is sane.
> the problem that the files get to big while merging remains. I wonder why
> there is not the possibility to tell lucene not to create files that are
> bigger than the system limit. how am i supposed to know after how many
> documents this limit is reached? lucene creates the documents - i just know
> the average size of a piece of text that is the input for a document. or am I
> missing something?!
> chantal

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message