lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1194) Add deleteByQuery to IndexWriter
Date Tue, 26 Feb 2008 20:06:51 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572637#action_12572637
] 

Michael McCandless commented on LUCENE-1194:
--------------------------------------------

{quote}
When autoCommit is true, we have to flush deletes with added documents
for update atomicity, don't we? UpdateByQuery can be added, if there is a
need.
{quote}
Good question Ning!

As of LUCENE-1044, when autoCommit=true, IndexWriter only commits on
committing a merge, not with every flush.

Hmmm ... but, there is actually the reverse problem now with my patch:
an auto commit can actually commit deletes before the corresponding
added docs are committed (from updateDocument calls).  This is
because, when we commit we only sync & commit the merged segments (not
the flushed segments).  Though, autoCommit=true is deprecated; once we
remove that (in 3.0) this problem goes away.  I'll have to ponder how
to fix that for now up until 3.0...it's tricky.  Maybe before 3.0
we'll just have to flush all deletes whenever we flush a new
segment....

Also, I don't think we need updateByQuery?  Eg in 3.0 when autoCommit
is hardwired to false then you can deleteDocuments(Query) and then
addDocument(...) and it will be atomic.

{quote}
Because of renumbering, we don't have to flush deletes at the start of
every merge, right?
{quote}

Exactly.  EG, we could carry the deletes indefinitely and then flush
them only on close (assuming autoCommit=false), but...

{quote}
But it is a good time to flush deletes.
{quote}

Right.  I think you want to flush all deletes that apply to the
segments being merged so merging will compact them.  Right now I apply
all buffered deletes to all segments when any merge is started.  It
may be possible to only apply them to those segments about to be
merged, but, I think it gets rather complex to track which buffered
deletes then need to apply to which segments.


> Add deleteByQuery to IndexWriter
> --------------------------------
>
>                 Key: LUCENE-1194
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1194
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: LUCENE-1194.patch
>
>
> This has been discussed several times recently:
>   http://markmail.org/message/awlt4lmk3533epbe
>   http://www.gossamer-threads.com/lists/lucene/java-user/57384#57384
> If we add deleteByQuery to IndexWriter then this is a big step towards
> allowing IndexReader to be readonly.
> I took the approach suggested in that first thread: I buffer delete
> queries just like we now buffer delete terms, holding the max docID
> that the delete should apply to.
> Then, I also decoupled flushing deletes (mapping term or query -->
> actual docIDs that need deleting) from flushing added documents, and
> now I flush deletes only when a merge is started, or on commit() or
> close().  SegmentMerger now exports the docID map it used when
> merging, and I use that to renumber the max docIDs of all pending
> deletes.
> Finally, I turned off tracking of memory usage of pending deletes
> since they now live beyond each flush.  Deletes are now only
> explicitly flushed if you set maxBufferedDeleteTerms to something
> other than DISABLE_AUTO_FLUSH.  Otherwise they are flushed at the
> start of every merge.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message