lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen" <jason.rutherg...@gmail.com>
Subject Re: Realtime Search for Social Networks Collaboration
Date Wed, 10 Sep 2008 20:03:31 GMT
Hi Mike,

There would be a new sorted list or something to replace the
hashtable?  Seems like an issue that is not solved.

Jason

On Tue, Sep 9, 2008 at 5:29 AM, Michael McCandless
<lucene@mikemccandless.com> wrote:
>
> This would just tap into the live hashtable that DocumentsWriter* maintain
> for the posting lists... except the docFreq will need to be copied away on
> reopen, I think.
>
> Mike
>
> Jason Rutherglen wrote:
>
>> Term dictionary?  I'm curious how that would be solved?
>>
>> On Mon, Sep 8, 2008 at 3:04 PM, Michael McCandless
>> <lucene@mikemccandless.com> wrote:
>>>
>>> Yonik Seeley wrote:
>>>
>>>>> I think it's quite feasible, but, it'd still have a "reopen" cost in
>>>>> that
>>>>> any buffered delete by term or query would have to be "materialiazed"
>>>>> into
>>>>> docIDs on reopen.  Though, if this somehow turns out to be a problem,
>>>>> in
>>>>> the
>>>>> future we could do this materializing immediately, instead of
>>>>> buffering,
>>>>> if
>>>>> we already have a reader open.
>>>>
>>>> Right... it seems like re-using readers internally is something we
>>>> could already be doing in IndexWriter.
>>>
>>> True.
>>>
>>>>> Flushing is somewhat tricky because any open RAM readers would then
>>>>> have
>>>>> to
>>>>> cutover to the newly flushed segment once the flush completes, so that
>>>>> the
>>>>> RAM buffer can be recycled for the next segment.
>>>>
>>>> Re-use of a RAM buffer doesn't seem like such a big deal.
>>>>
>>>> But, how would you maintain a static view of an index...?
>>>>
>>>> IndexReader r1 = indexWriter.getCurrentIndex()
>>>> indexWriter.addDocument(...)
>>>> IndexReader r2 = indexWriter.getCurrentIndex()
>>>>
>>>> I assume r1 will have a view of the index before the document was
>>>> added, and r2 after?
>>>
>>> Right, getCurrentIndex would return a MultiReader that includes
>>> SegmentReader for each segment in the index, plus a "RAMReader" that
>>> searches the RAM buffer.  That RAMReader is a tiny shell class that would
>>> basically just record the max docID it's allowed to go up to (the docID
>>> as
>>> of when it was opened), and stop enumerating docIDs (eg in the TermDocs)
>>> when it hits a docID beyond that limit.
>>>
>>> For reading stored fields and term vectors, which are now flushed
>>> immediately to disk, we need to somehow get an IndexInput from the
>>> IndexOutputs that IndexWriter holds open on these files.  Or, maybe, just
>>> open new IndexInputs?
>>>
>>>> Another thing that will help is if users could get their hands on the
>>>> sub-readers of a multi-segment reader.  Right now that is hidden in
>>>> MultiSegmentReader and makes updating anything incrementally
>>>> difficult.
>>>
>>> Besides what's handled by MultiSegmentReader.reopen already, what else do
>>> you need to incrementally update?
>>>
>>> Mike
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message