lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter
Date Wed, 18 Feb 2009 15:45:01 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674662#action_12674662
] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------


Looks good, Jason.  This is big change, and I expect to go through a
number of iterations before settling... plus we still need to figure
out how the API is exposed.  Comments:

  * All this logic needs to be conditional (this also depends on what
    API we actually settle on to expose this...): right now you always
    open a reader whenever IW is created.

  * We should assume we do not need to support autoCommit=true in this
    patch (since this will land after 3.0).  This simplifies things.

  * IW.reopenInternalReader only does a clone not a reopen; how does
    it cover the newly flushed segment?

  * After a merge commits you don't seem to reopen the reader?  This
    is actually tricky to do right, for realtime search: we somehow
    need to allow for warming of the newly created (merged) segment,
    in such a way that we do not block the flushing of further
    segments and reopen of readers against those new segments.  I
    think what may be best is to subclass IW, and override a newly
    added "postMerge" method that's invoked on the new segment before
    the merge is committed into the SegmentInfos.  This is cleaner
    than allowing the change into the SegmentInfos and then having to
    make a custom deletion policy & track history of each segment.

  * It seems like reader.reopen() (where reader was obtained with
    IW.getReader()) doesn't do the right thing?  (ie it's looking for
    the most recent segments_N in the Directory, but it should be
    looking for it @ IW.segmentInfos).

  * I think we should decouple "materializing deletes down to docIDs"
    from "flushing deletes to disk".  IW does both as the same
    operation now (because it doesn't want to hold SR open for a long
    time), but once we have persistent open SegmentReaders we should
    separate these.  It's not necessary for IW to write new .del files
    when it materializes deletes.

  * Instead of having to merge readers, I think we should have a
    single source to obtain an SR from.  This way, when IW needs to
    materialize deletes, it will grab the same instance of SR for a
    given segment that the currently open MSR is using.  Also, when
    merging kicks off, it'll grab the SR from the same source (this
    way deletes in RAM will be correctly merged away).  Also, I think
    we should not use MSR for doing deletions (and still go segment by
    segment): it's quite a bit slower since every invocation must do
    the binary search again.

  * Likewise, you have to fix the commitMergedDeletes to decouple
    computing the new BitVector from writing the .del file to disk.
    That method should only create a new BitVector, for the newly
    merged segment.  It must be synchronized to prevent any new
    deletions against the segments that were just merged.  In fact,
    this is a real danger: after a merge finishes, if one continues to
    use an older reader to do deletions you get into trouble.

  * I still don't really like having both the IR and IW able to do
    deletions, with slightly different semantics.  As it stands now,
    since you can't predict when IW materializes deletes, your reader
    will suddenly see a bunch of deletes appear.  I think it's better
    if no deletes appear, ever, until you reopen your reader.  Maybe
    we simply prevent deletion through the IR?

  * We need some serious unit tests here!


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message