lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen" <jason.rutherg...@gmail.com>
Subject Re: SegmentReader with custom setting of deletedDocs, single reusable FieldsReader
Date Sat, 28 Jun 2008 01:11:35 GMT
A possible solution to FieldsReader is to have an IndexReader.documents()
method that returns a Documents class.  The Documents class maintains an
underlying FieldsReader and file descriptor that can be closed like TermEnum
or TermDocs etc.  Of course it would have a document(int n, FieldSelector
selector) method.  The issue is what the default behavior would be for
IndexReader.document for the SegmentReader.clone/reopen(boolean force).  I'm
not sure how efficient it would be to open and close a FieldsReader per
IndexReader.document call.

I was using InstantiatedIndex and performing a commit when changes came in,
but realized that during the commit incoming searches could see wrong
results.  Now, like the InstantiatedIndex javadocs suggest, each one is
immutable.

The IndexReader over the RAM buffer sounds good.  As an interim solution it
would be beneficial to have the SegmentReader.clone/reopen(boolean force) so
that the first version of Ocean can be completed and I can move on to other
projects like Tag Index.

On Fri, Jun 27, 2008 at 2:43 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Jason Rutherglen <jason.rutherglen@gmail.com> wrote:
>
> > One of the things I do not understand about IndexWriter deletes is
> > it does not reuse an already open TermInfosReader with the tii
> > loaded.  Isn't this slower than deleting using an already open
> > IndexReader?
>
> That's right: every time IW decides to flush deletes (which is
> currently before a merge starts, when autoCommit=false), it visits
> each segment and 1) opens a SegmentReader, 2) translates all buffered
> deletes (by term, by query) into docIDs stored into the deletedDocs of
> that SegmentReader, writes new _X_N.del files to record the deletes
> for the segment and then closes the SegmentReader.
>
> We could instead keep these SegmentReaders open and reuse them for
> applying deletes.  Then the IndexWriter could present an IndexReader
> (MultiReader) that reads these segments, plus the IndexReader reading
> buffered docs in RAM.  This would basically be a "combined
> IndexWriter / IndexReader".
>
> I think the IndexReader that reads DocumentWriter's RAM buffer would
> still search a point-in-time snapshot of the index, unlike
> InstantiatedIndexReader, and require an explicit reopen() to refresh.
> This is because some non-trivial computation is still required when
> there are changes.  EG if a delete-by-query has happened, reopen()
> must resolve that query into docIDs.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message