lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: best practice on too many files vs IO overhead
Date Fri, 27 Nov 2009 10:37:21 GMT
Are you sure you're closing all readers that you're opening?

It's surprising with normal usage of Lucene that you'd run out of
descriptors, with its default mergeFactor (have you increased the

You can also enable compound file, which uses far fewer file
descriptors, at some cost to indexing performance.

Also, a partial optimize (ie optimize(N)) does less IO but still
substantially reduces segment count of the index.


On Fri, Nov 27, 2009 at 4:23 AM, Istvan Soos <> wrote:
> Hi,
> I've a requirement that involves frequent, batched update of my Lucene
> index. This is done by a memory queue and process that periodically
> wakes and process that queue into the Lucene index.
> If I do not optimize my index, I'll receive "too many open files"
> exception (yeah, right, I can get the OS's limit up a bit, but that
> just prolongs the exception).
> If I do optimize my index, I'll receive a very large IO overhead (as
> it reads again and writes the whole index).
> Right now I'm optimizing the index on each batch cycle, but as my
> index size quickly goes to around 1GB, I experience great overhead in
> the IO operations. The update shall happen frequently (1-10 times per
> minute), so I'm looking for advices how to solve this issue. I might
> split the index, but that way I'll receive the "too many open files"
> sooner, and in the end the IO overhead remains...
> Any suggestions?
> Thanks,
>   Istvan
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message