lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <gol...@detego-software.de>
Subject Re: optimized disk usage when creating a compound index
Date Sun, 08 Aug 2004 13:16:26 GMT
Bernhard Messer wrote:
> Hi Christoph,
> 
> just reviewed the TestCompoundFile.java and you where absolutly right 
> when saying that the test will fail on windows.  No the test is changed 
> in a way that a second file with identical data is created. This file 
> can be used in the testcases to make the comparisons against the 
> compound store. Now the modified test runs fine on Microsoft and Linux 
> platforms.
> 
> In the attachment you'll find the new TestCompoundFile source.
> 
> hope this helps
> Bernhard

Hi Bernhard,

I reconsidered your chances again.
The problem that is solved is the following:

If compound files are used, Lucene needs up to 3 times the disk space (during
indexing) that is required by the final index. The reason is that during a
merge of mergeFactor segments, these segments are doubled by merging them into a
new one and then the new segment is doubled again while generating its compound
file.

You solved the problem by deleting individual files from a segment earlier while
building the compound file. However, this means that the CompoundFileWriter in
its close operation now deletes files. This is not necessarily what one expects
if one uses a CompoundFileWriter. It should only generate a compound file, not 
delete the original files. Therefore you had to change CompoundFileWriter tests
accordingly!

My idea now is to change IndexWriter so that during merge all old segments are
deleted before the compound file is generated. I think that I also avoid the
factor of 3 and get a maximum factor of 2 concerning disk space. I committed my
changes. Could you do a test as you did with your patch to verify if my changes 
have the desired outcome too? That would be great,

Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message