lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen" <jason.rutherg...@gmail.com>
Subject Re: Realtime Search for Social Networks Collaboration
Date Mon, 08 Sep 2008 20:35:58 GMT
Term dictionary?  I'm curious how that would be solved?

On Mon, Sep 8, 2008 at 3:04 PM, Michael McCandless
<lucene@mikemccandless.com> wrote:
>
> Yonik Seeley wrote:
>
>>> I think it's quite feasible, but, it'd still have a "reopen" cost in that
>>> any buffered delete by term or query would have to be "materialiazed"
>>> into
>>> docIDs on reopen.  Though, if this somehow turns out to be a problem, in
>>> the
>>> future we could do this materializing immediately, instead of buffering,
>>> if
>>> we already have a reader open.
>>
>> Right... it seems like re-using readers internally is something we
>> could already be doing in IndexWriter.
>
> True.
>
>>> Flushing is somewhat tricky because any open RAM readers would then have
>>> to
>>> cutover to the newly flushed segment once the flush completes, so that
>>> the
>>> RAM buffer can be recycled for the next segment.
>>
>> Re-use of a RAM buffer doesn't seem like such a big deal.
>>
>> But, how would you maintain a static view of an index...?
>>
>> IndexReader r1 = indexWriter.getCurrentIndex()
>> indexWriter.addDocument(...)
>> IndexReader r2 = indexWriter.getCurrentIndex()
>>
>> I assume r1 will have a view of the index before the document was
>> added, and r2 after?
>
> Right, getCurrentIndex would return a MultiReader that includes
> SegmentReader for each segment in the index, plus a "RAMReader" that
> searches the RAM buffer.  That RAMReader is a tiny shell class that would
> basically just record the max docID it's allowed to go up to (the docID as
> of when it was opened), and stop enumerating docIDs (eg in the TermDocs)
> when it hits a docID beyond that limit.
>
> For reading stored fields and term vectors, which are now flushed
> immediately to disk, we need to somehow get an IndexInput from the
> IndexOutputs that IndexWriter holds open on these files.  Or, maybe, just
> open new IndexInputs?
>
>> Another thing that will help is if users could get their hands on the
>> sub-readers of a multi-segment reader.  Right now that is hidden in
>> MultiSegmentReader and makes updating anything incrementally
>> difficult.
>
> Besides what's handled by MultiSegmentReader.reopen already, what else do
> you need to incrementally update?
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message