lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter
Date Thu, 05 Mar 2009 10:11:56 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679140#action_12679140
] 

Michael McCandless commented on LUCENE-1516:
--------------------------------------------

Thanks Jason, good progress...

{quote}
I'm not sure we need to write out the deletes of unused segment
readers because they are no longer used and so should not be
required.
{quote}

EG I think DocumentsWriter.applyDeletes should get the reader from
pool, do deletes, then call pool.release(reader), and that release
would write the changes to disk if we're not pooling readers.  Then we
don't need a new "flushDeletesToDir" propogated around.

On an explicit commit(), we should also sweep the pool and write
changes to disk for any SR that has pending changes.

{quote}
> You commented out the last part of commitMergedDeletes, that
actually saves the deletes. You need to instead get the reader for
the merged segment from the pool and hand it the new deletes.

I wasn't sure what you meant by this, in the patch the deletes are
copied into the merged reader. Do you mean instead the merged reader
should not be opened and instead the deletes file needs to be written
to?
{quote}

Woops -- sorry: you are doing it correctly (applying to the
mergedReader).  I missed that.

bq. SegmentMerger wasn't always returning the docMap so I stopped using it.

It should always return it, unless there were no deletes on that
segment when the merge started, in which case the docMap is null.  But
you can handle that case (just check if there are now any deletes, and
carry them over).

(Then we shouldn't need merge.segmentReadersClone, again).

{quote}
The solution, manually incref the .del file after the pool
commits on the SRs.
{quote}

That's spooky -- why exactly is it needed?  We should only do an extra
incRef if we can explain exactly which decRef it will correspond to.
Shouldn't IW.checkpoint still work properly for the .del files?

Also, SegmentInfo.files should be cleared whenever delGen is advanced
-- can you give more details on what path doesn't properly clear the
files?

The fact that an SR can carry deletes in memory without writing a new
.del file should not impact IFD.  Whenever we do advance delGen and
write new .del files, then we must call checkpoint().

So I think we need to really explain the root cause here...

Others:

  * You overrode decRef in DirectoryIndexReader, to not write changes
    whenever a writer is present, but I think that a better approach
    is to leave decRef as it is and then from writer, clear hasChanges
    when you want to discard them (because they were merged away).

  * You inserted applyDeletes at the top of commitMergedDeletes -- why
    was that needed?

  * There's alot of noise in the patch -- whitespace changes,
    commented out debug code, etc.  Can you remove some of it?  It
    makes it harder to separate signal from noise...


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch,
LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message