lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Realtime Search for Social Networks Collaboration
Date Tue, 09 Sep 2008 09:29:37 GMT

This would just tap into the live hashtable that DocumentsWriter*  
maintain for the posting lists... except the docFreq will need to be  
copied away on reopen, I think.

Mike

Jason Rutherglen wrote:

> Term dictionary?  I'm curious how that would be solved?
>
> On Mon, Sep 8, 2008 at 3:04 PM, Michael McCandless
> <lucene@mikemccandless.com> wrote:
>>
>> Yonik Seeley wrote:
>>
>>>> I think it's quite feasible, but, it'd still have a "reopen" cost  
>>>> in that
>>>> any buffered delete by term or query would have to be  
>>>> "materialiazed"
>>>> into
>>>> docIDs on reopen.  Though, if this somehow turns out to be a  
>>>> problem, in
>>>> the
>>>> future we could do this materializing immediately, instead of  
>>>> buffering,
>>>> if
>>>> we already have a reader open.
>>>
>>> Right... it seems like re-using readers internally is something we
>>> could already be doing in IndexWriter.
>>
>> True.
>>
>>>> Flushing is somewhat tricky because any open RAM readers would  
>>>> then have
>>>> to
>>>> cutover to the newly flushed segment once the flush completes, so  
>>>> that
>>>> the
>>>> RAM buffer can be recycled for the next segment.
>>>
>>> Re-use of a RAM buffer doesn't seem like such a big deal.
>>>
>>> But, how would you maintain a static view of an index...?
>>>
>>> IndexReader r1 = indexWriter.getCurrentIndex()
>>> indexWriter.addDocument(...)
>>> IndexReader r2 = indexWriter.getCurrentIndex()
>>>
>>> I assume r1 will have a view of the index before the document was
>>> added, and r2 after?
>>
>> Right, getCurrentIndex would return a MultiReader that includes
>> SegmentReader for each segment in the index, plus a "RAMReader" that
>> searches the RAM buffer.  That RAMReader is a tiny shell class that  
>> would
>> basically just record the max docID it's allowed to go up to (the  
>> docID as
>> of when it was opened), and stop enumerating docIDs (eg in the  
>> TermDocs)
>> when it hits a docID beyond that limit.
>>
>> For reading stored fields and term vectors, which are now flushed
>> immediately to disk, we need to somehow get an IndexInput from the
>> IndexOutputs that IndexWriter holds open on these files.  Or,  
>> maybe, just
>> open new IndexInputs?
>>
>>> Another thing that will help is if users could get their hands on  
>>> the
>>> sub-readers of a multi-segment reader.  Right now that is hidden in
>>> MultiSegmentReader and makes updating anything incrementally
>>> difficult.
>>
>> Besides what's handled by MultiSegmentReader.reopen already, what  
>> else do
>> you need to incrementally update?
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message