lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: Index Optimization space requirements
Date Mon, 04 Nov 2002 18:17:27 GMT
Konrad Scherer wrote:
> I am using lucene 1.2 (Java 1.4 on Solaris 7) and the xml indexer to 
> index ~24000 small xml documents. The finished and optimized index uses 
> around 340 MB disk space. The documents are reindexed once a week and 
> this has worked without any trouble for months. Recently the free space 
> on the hard drive was down to 1.36 GB and the optimization crashed due 
> to "no space left on device". Deleting the index directory freed up 1.36 
> GB.
> Question 1) Is it normal for the optimization process to require this 
> much extra space?

Optimization (and indexing in general) works by copying, so it requires 
around twice the space that the index occupies when optimized.  Due to 
file fragmentation and to the index format, an unoptimized index will 
actually occupy slightly more space than the same index when optimized.

Another thing which could complicate matters is, if the index is being 
searched while it is modified and optimized, then there could be three 
copies: one being searched, one the penultimate copy before the 
optimized index, and, finally, the optimized index.  Things could be 
even worse if there are many searchers that were opened on different 
versions of the index.  As long as a version of the index is open its 
space cannot be freed.

> 2) Did I miss an option somewhere to limit the space usage of the 
> optimization process?

If you're searching concurrently, try closing searchers more promptly. 
You should only need to keep a single searcher open at a time, shared by 
all queries.

> 3) More philosophically, do I really need the optimization?

It will definitely make searches faster, but if search performance is 
not an issue, I wouldn't bother.

Doug


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message