Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 82386 invoked from network); 26 Feb 2008 20:07:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Feb 2008 20:07:59 -0000 Received: (qmail 38329 invoked by uid 500); 26 Feb 2008 20:07:51 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 38296 invoked by uid 500); 26 Feb 2008 20:07:51 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 38285 invoked by uid 99); 26 Feb 2008 20:07:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2008 12:07:51 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2008 20:07:02 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id CDE9929A0017 for ; Tue, 26 Feb 2008 12:06:51 -0800 (PST) Message-ID: <954796122.1204056411842.JavaMail.jira@brutus> Date: Tue, 26 Feb 2008 12:06:51 -0800 (PST) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1194) Add deleteByQuery to IndexWriter In-Reply-To: <2014221350.1204041295996.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572637#action_12572637 ] Michael McCandless commented on LUCENE-1194: -------------------------------------------- {quote} When autoCommit is true, we have to flush deletes with added documents for update atomicity, don't we? UpdateByQuery can be added, if there is a need. {quote} Good question Ning! As of LUCENE-1044, when autoCommit=true, IndexWriter only commits on committing a merge, not with every flush. Hmmm ... but, there is actually the reverse problem now with my patch: an auto commit can actually commit deletes before the corresponding added docs are committed (from updateDocument calls). This is because, when we commit we only sync & commit the merged segments (not the flushed segments). Though, autoCommit=true is deprecated; once we remove that (in 3.0) this problem goes away. I'll have to ponder how to fix that for now up until 3.0...it's tricky. Maybe before 3.0 we'll just have to flush all deletes whenever we flush a new segment.... Also, I don't think we need updateByQuery? Eg in 3.0 when autoCommit is hardwired to false then you can deleteDocuments(Query) and then addDocument(...) and it will be atomic. {quote} Because of renumbering, we don't have to flush deletes at the start of every merge, right? {quote} Exactly. EG, we could carry the deletes indefinitely and then flush them only on close (assuming autoCommit=false), but... {quote} But it is a good time to flush deletes. {quote} Right. I think you want to flush all deletes that apply to the segments being merged so merging will compact them. Right now I apply all buffered deletes to all segments when any merge is started. It may be possible to only apply them to those segments about to be merged, but, I think it gets rather complex to track which buffered deletes then need to apply to which segments. > Add deleteByQuery to IndexWriter > -------------------------------- > > Key: LUCENE-1194 > URL: https://issues.apache.org/jira/browse/LUCENE-1194 > Project: Lucene - Java > Issue Type: New Feature > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1194.patch > > > This has been discussed several times recently: > http://markmail.org/message/awlt4lmk3533epbe > http://www.gossamer-threads.com/lists/lucene/java-user/57384#57384 > If we add deleteByQuery to IndexWriter then this is a big step towards > allowing IndexReader to be readonly. > I took the approach suggested in that first thread: I buffer delete > queries just like we now buffer delete terms, holding the max docID > that the delete should apply to. > Then, I also decoupled flushing deletes (mapping term or query --> > actual docIDs that need deleting) from flushing added documents, and > now I flush deletes only when a merge is started, or on commit() or > close(). SegmentMerger now exports the docID map it used when > merging, and I use that to renumber the max docIDs of all pending > deletes. > Finally, I turned off tracking of memory usage of pending deletes > since they now live beyond each flush. Deletes are now only > explicitly flushed if you set maxBufferedDeleteTerms to something > other than DISABLE_AUTO_FLUSH. Otherwise they are flushed at the > start of every merge. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org