lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: DbDirectory and compound files
Date Thu, 30 Sep 2004 15:56:06 GMT
Andi Vajda wrote:
> You ask if this makes sense. No, not really. I don't know the details of 
> the purpose of the compound file implementation so this may be my problem.

The purpose of the compound file implementation is to minimize the 
number of open files that an IndexReader must keep open.  Instead of 7 + 
the number of indexed fields files per segement, only a single file must 
be kept open per segement.  This helps applications which keep lots of 
unoptimized indexes open.  (It also, and this is more common, helps 
folks who open a new IndexReader for each query and don't close it.  In 
this case, opening fewer files gives the garbage collector time to close 
files before the process runs into its file descriptor limit, inducing a 
flurry of but reports about "too many open files".)

Does that make any more sense?

> However, from earlier posts of yours, it seems that the Directory 
> implementation classes such as OutputStream et al are being deprecated 
> and replaced by others, so it may very well be that DbDirectory needs to 
> be rewritten when these changes are finalized.

These changes are back-compatible: the old classes and methods are still 
there and interoperate with the new but are deprecated.  You might wait 
until there is a Lucene release with the new API in it before you update 
DbDirectory.  To move to the new API, all that should be required is 
changing your subclass of InputStream to instead subclass 
BufferedIndexInput, and also change your subclass of IndexOutput to 
instead subclass BufferedIndexOutput.  You'll also need to add a 
length() method to your BufferedIndexInput subclass, instead of setting 
a protected length field in the constructor.  That's it.

The revision of the API was primarily to make buffering optional.  We 
could have left the buffered implementation names the same, but then the 
classes would be named poorly and it also seemed like an opportunity to 
remove the name clash with


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message