lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Realtime Search for Social Networks Collaboration
Date Mon, 08 Sep 2008 19:04:49 GMT

Yonik Seeley wrote:

>> I think it's quite feasible, but, it'd still have a "reopen" cost  
>> in that
>> any buffered delete by term or query would have to be  
>> "materialiazed" into
>> docIDs on reopen.  Though, if this somehow turns out to be a  
>> problem, in the
>> future we could do this materializing immediately, instead of  
>> buffering, if
>> we already have a reader open.
> Right... it seems like re-using readers internally is something we
> could already be doing in IndexWriter.


>> Flushing is somewhat tricky because any open RAM readers would then  
>> have to
>> cutover to the newly flushed segment once the flush completes, so  
>> that the
>> RAM buffer can be recycled for the next segment.
> Re-use of a RAM buffer doesn't seem like such a big deal.
> But, how would you maintain a static view of an index...?
> IndexReader r1 = indexWriter.getCurrentIndex()
> indexWriter.addDocument(...)
> IndexReader r2 = indexWriter.getCurrentIndex()
> I assume r1 will have a view of the index before the document was
> added, and r2 after?

Right, getCurrentIndex would return a MultiReader that includes  
SegmentReader for each segment in the index, plus a "RAMReader" that  
searches the RAM buffer.  That RAMReader is a tiny shell class that  
would basically just record the max docID it's allowed to go up to  
(the docID as of when it was opened), and stop enumerating docIDs (eg  
in the TermDocs) when it hits a docID beyond that limit.

For reading stored fields and term vectors, which are now flushed  
immediately to disk, we need to somehow get an IndexInput from the  
IndexOutputs that IndexWriter holds open on these files.  Or, maybe,  
just open new IndexInputs?

> Another thing that will help is if users could get their hands on the
> sub-readers of a multi-segment reader.  Right now that is hidden in
> MultiSegmentReader and makes updating anything incrementally
> difficult.

Besides what's handled by MultiSegmentReader.reopen already, what else  
do you need to incrementally update?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message