lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernhard Messer <Bernhard.Mes...@intrafind.de>
Subject Re: optimized disk usage when creating a compound index
Date Sun, 08 Aug 2004 14:14:51 GMT
Christoph,

very clever implementation and bad news for all disk manufacturer ;-). 
The patch works as expected and reduces the max. disk usage the same way 
announced in the first message introducing this patch.

thanks
Bernhard

Christoph Goller wrote:

> Bernhard Messer wrote:
>
>> Hi Christoph,
>>
>> just reviewed the TestCompoundFile.java and you where absolutly right 
>> when saying that the test will fail on windows.  No the test is 
>> changed in a way that a second file with identical data is created. 
>> This file can be used in the testcases to make the comparisons 
>> against the compound store. Now the modified test runs fine on 
>> Microsoft and Linux platforms.
>>
>> In the attachment you'll find the new TestCompoundFile source.
>>
>> hope this helps
>> Bernhard
>
>
> Hi Bernhard,
>
> I reconsidered your chances again.
> The problem that is solved is the following:
>
> If compound files are used, Lucene needs up to 3 times the disk space 
> (during
> indexing) that is required by the final index. The reason is that 
> during a
> merge of mergeFactor segments, these segments are doubled by merging 
> them into a
> new one and then the new segment is doubled again while generating its 
> compound
> file.
>
> You solved the problem by deleting individual files from a segment 
> earlier while
> building the compound file. However, this means that the 
> CompoundFileWriter in
> its close operation now deletes files. This is not necessarily what 
> one expects
> if one uses a CompoundFileWriter. It should only generate a compound 
> file, not delete the original files. Therefore you had to change 
> CompoundFileWriter tests
> accordingly!
>
> My idea now is to change IndexWriter so that during merge all old 
> segments are
> deleted before the compound file is generated. I think that I also 
> avoid the
> factor of 3 and get a maximum factor of 2 concerning disk space. I 
> committed my
> changes. Could you do a test as you did with your patch to verify if 
> my changes have the desired outcome too? That would be great,
>
> Christoph
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message