lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernhard Messer <Bernhard.Mes...@intrafind.de>
Subject Re: optimized disk usage when creating a compound index
Date Tue, 10 Aug 2004 07:30:57 GMT
Christoph,

doesn't matter which solution you choose. The originally idea was to try 
some optimizations which could be implemented in a simple way and has as 
less negative side effects as possible. I think this is done within the 
first patch. The complexity of the overall system will grow anyway, so 
my personal opinion is to  start  with a simple solution.  If it is 
working, we could try again and put some effort on it to optimize it 
again and increase complexity.

best regards
Bernhard

goller@detego-software.de wrote:

>Dmitry Serebrennikov <dmitrys@earthlink.net> schrieb am 09.08.2004,
>19:15:20:
>
>  
>
>>Well, I think this could work, but I'm not sure how this will behave if 
>>an IndexReader is created on the new segment while it is still 
>>uncompound. Then when you try to delete the individual files, you'd have 
>>to implement something like "deletable" file for segments (to work with 
>>Windows file locking).
>>    
>>
>
>That's right. I would use the deletable mechanism in IndexWriter
>to delete the non-compound files of the index after creation of
>the compound file. That's step 4 of my last mail. It would be done
>within a commit lock.
>
>  
>
>>Anyway, what do you think of the original way proposed by Bernard? I 
>>think that method was ok. If I understand correctly, in that method the 
>>merge process does not end until compound file is created (as before), 
>>but the files are deleted as they are merged in. I suppose there is a 
>>chance that the compound file creation process fails and we would not 
>>have any new segment since the files that were useable would have been 
>>half deleted. Is that what's bothering you in this solution? To me this 
>>seems acceptable because it shouldn't happen frequently. What do you 
>>think? Is there anything I'm missing about Bernard's solution?
>>    
>>
>
>In Bernhard's solution the old segments that have been merged into the
>new segment are still there while building the compound file. Disk space
>is saved by deleting the non-compound files of the new segment earlier
>than in the original implementation, immediately after copying them into
>the compound file. However, usually there are 1 to 3 big files in a
>segment. So the advantage in disk space is not as big as it could be
>with my solution. Individual files still exist in 3 copies for a short
>period of time (while they are copied). 
>
>Furthermore, deleting files in CompoundFilerWriter.close is not
>what I would expect from a CompoundFilerWriter. But since it is only
>used in SegmentMerger, it's ok.
>
>I am not insisting on my solution. I was just about to commit Bernhard's
>solution on Sunday, but then I thought it could be done better....
>
>Now I am not sure what to do. I am still a little bit in favour of my
>idea, but not so much....
>  
>
> 
>  
>
>>(By the way, Thanks for helping to maintain and improve this code!)
>>Dmitry.
>>    
>>
>
>I think we are all doing this because it's fun and there is such a
>great community immediately looking at, testing and reviewing our work.
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>  
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message