lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Serebrennikov <>
Subject Re: optimized disk usage when creating a compound index
Date Thu, 12 Aug 2004 18:44:33 GMT
Hi Christoph,

I agree that your approach achieves better disk usage than deleting 
segments as they are being merged into the compound file, chiefly 
because most indexes have one or two large files and the rest are small. 
I have not reviewed your latest code yet (it's a bit hard without a 
checked out working copy of the CVS image, btw, could you post diffs so 
others can more readily review them?), but from what you are describing 
here's what I think. It sounds like it would work, but it also sounds a 
bit cludgy. The main thing that I don't like is that we are now 
inventing another way of doing what Lucene already does - maintaining 
index integrity across filesystem changes and safely deleting unneeded 
files. I'm thinking that Lucene already has a way of switching to the 
new segments file, but we are proposing something similar with renaming 
of the cfs file.

A note on the norms with .f and .s files - this is getting complicated...

One note on SegmentReader.files() - we should probably have the "tmp" 
extension listed here so we can cleanup segments that failed to create a 
cfs file.

Here's an alternative idea that leverages existing Lucene segments file:
Could we simply create compound file in a new segment? This way we don't 
have to invent the "tmp" file or change anything else about the files 
(like the norms stuff).

All in all, I haven't really been involved in Lucene codebase closely 
enough lately, and this is starting to impact things like norms, locks, 
and merging, so that I don't feel qualified to make the final call on 
this. I'd like to hear what Doug and others think. From my point of 
view, I don't really see anything *wrong* with the latest set of changes 
(just need to add "tmp" file to SegmentReader.files()), but it doesn't 
strike me as an obviously *right* way to do this either yet. So I'll 
change my vote to a 0 and see what others think. :)



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message