lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Serebrennikov <dmit...@earthlink.net>
Subject Re: idea for reducing file handle use
Date Tue, 23 Sep 2003 22:48:57 GMT
Doug Cutting wrote:

> Dmitry Serebrennikov wrote:
>
>> Ok, I am working on a version that would limit the changes to the 
>> Directory class, but this directory would have to make certain 
>> assumptions about the names of the files (whereas right now it 
>> doesn't care). It would have to differentiate the segments file, the 
>> deleted documents file(s?), and the other segment files. It would 
>> also have to assume that the part before the last "." in a file name 
>> is the segment name. Does this sound better than the other idea?
>
>
> That sounds a little ugly.
>
> Perhaps the Directory API could be extended to better support your 
> technique.  For example, one could add subdirectory notion.  One could 
> create a new subdirectory for each segment, and then explicitly close 
> it once the segment is complete.  On close, it could be optimized by 
> appending its files into a single file and writing a table-of-contents 
> file. 

Yes, I thought about this. But I figured that would require everyone to 
change their Directory implementation (since it can't supply a default 
implementation for these methods, being an interface), so I figured the 
code wouldn't be a simple "drop-in".

>
>
> If the indexing code were changed as above, would you still need to 
> know anything about the segments file or deletions files?  It seems to 
> me that deletions could be handled by adding a new file into the 
> subdirectory, so that a subdirectory contains both the optimized 
> content, and any files added afterwards, or somesuch.

Yea, the .del file is fixed length and I thought about incorporating it 
into the "compound file", or at least adding it later. This can still be 
done. For now, the drop from 24 files to 2 is good enough for me, so the 
additional 2x drop from 2 to 1 can wait. But I think it can be done with 
the way I ended up implementing things as well.

Thanks.
Dmitry.



Mime
View raw message