lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <>
Subject [jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter
Date Thu, 19 Feb 2009 18:16:02 GMT


Jason Rutherglen commented on LUCENE-1516:

Mike, good points... 

{quote} since you can't predict when IW materializes deletes, your reader
will suddenly see a bunch of deletes appear.{quote}

The reader would need to be reopened to see the deletes. Isn't that
expected behavior?

{quote} Instead of having to merge readers, I think we need a single
source to obtain an SR from {quote}

I like this however how would IR.clone work? I like having the
internal reader separate from the external reader. The main reason to
expose IR from IW is to allow delete by doc id and norms updates
(eventually column stride fields updates). I don't see how we can
grab a reader during a merge, and block realtime deletes occurring on
the external reader. However it is difficult to rectify deletes to an
external SR that's been merged away. 

It seems like we're getting closer to using a unique long UID for
each doc that is carried over between merges. I was going to
implement this above LUCENE-1516 however we may want to make UIDs a
part of LUCENE-1516 to implement the behavior we're discussing. 

If the updates to SR are queued, then it seems like the only way to
achieve this is a doc UID. This way merges can happen in the
background, the IR has a mechanism for mapping it's queue to the
newly merged segments when flushed. Hopefully we aren't wreaking
havoc with the IndexReader API?

The scenario I think we're missing is if there's multiple cloned SRs
out there. With the IW checkout an SR model how do we allow cloning?
A clone's updates will be placed into a central original SR queue?
The queue is drained automatically on a merge or IW.flush? What
happens when we want the IR deletes to be searchable without flushing
to disk? Do a reopen/clone? 

bq. number of iterations before settling

Agreed, if it were simple it wouldn't be fun. ☺

{quote} It's not necessary for IW to write new .del files when it
materializes deletes.{quote}

Good point, DocumentsWriter.applyDeletes shouldn't be flushing to
disk and this sounds like a test case to add to TestIndexWriterReader.

{quote} IW.reopenInternalReader only does a clone not a reopen; however
does it cover the newly flushed segment? {quote}

The segmentinfos is obtained from the Writer. In the test case
testIndexWriterReopenSegment it looks like using clone reopens the
new segments.

{quote} I think it's better if no deletes appear, ever, until you reopen
your reader. Maybe we simply prevent deletion through the IR? {quote}

Preventing deletion through the IR would seem to defeat the purpose
of the patch unless there's some alternative mechanism for deleting
by doc id? 

{quote} commitMergedDeletes to decouple computing the new BitVector from
writing the .del file to disk.{quote}

A hidden method I never noticed. I'll keep it in mind.

{quote} It seems like reader.reopen() (where reader was obtained with
IW.getReader()) doesn't do the right thing? (ie it's looking for the
most recent segments_N in the Directory, but it should be looking for
it @ IW.segmentInfos).{quote}

Using the reopen method implementation for a Reader with IW does not
seem necessary. It seems like it could call clone underneath?

> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>                 Key: LUCENE-1516
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
>   Original Estimate: 672h
>  Remaining Estimate: 672h
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message