lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajay Garg <ajaygargn...@gmail.com>
Subject Re: Query in IndexWriter.deleteDocuments(Term term)
Date Sat, 26 Jul 2008 10:40:01 GMT

Thanks Mike. That was quite explanatory. A couple of doubts :

1. The deletions apply to buffered as well as stored-in-RAM documents.
Right. So, if the index directory contains 1 document that matches a
deleteDocument query, and 1 document in RAM that contains the same
deleteDocument query, then, will the document-in-index-directory be deleted
immediately, or when a flush is called. (It seems logical, that irrespective
of the location of document, "actual" deletion occurs only when a flush is
called .. just need to be doubly sure ...)

2. Yes I am planning to rewrite a project using Lucene 2.3.2. So, is the
next version heading straight to 3.0 ??? (Sorry, if this question seems to
be a little out of context of the current thread)

Looking forward to a reply.

Thanks
Ajay Garg

Michael McCandless-2 wrote:
> 
> 
> java_is_everything wrote:
> 
>>
>> Hi all.
>>
>> This may seem a longish and informal mail, but do correct me if my
>> assumptions are wrong anywhere, otherwise my actual doubt will make no
>> sense.
>>
>> Say I opened an IndexWriter on an initially empty directory, using
>> autocommit = true. Now, what I do is add and delete documents  
>> randomly. I
>> set "x" as maxBufferedDocs and "y" as maxBufferedDeleteTerms (x < y).
>>
>> IndexWritrer starts its work. Now, I perfom the following sequences :
>>
>> STAGE 1 :
>> Add "x-2" documents one after the another.                Total docs  
>> in
>> memory = x-2                (1)
>> Delete 3 docs from memory                                        
>> Total docs
>> in memory = x-5               (2)
>> Add 5 docs one after another                                     
>> Total docs
>> in memory = x                   (3)
>>
>> STAGE 2 :
>> A flush happens, sice maxBufferedDocs reached.           Total docs in
>> memory = 0                  (4)
> 
> One correction here: the added doc count that triggers a flush does  
> *not* measure deletions.  So, in your step (3) above, after having  
> added 2 of the 5 docs, IW will flush.  Then it has 3 added docs  
> buffered in RAM.
> 
>> Thus, it is also a commit.
> 
> Assuming you're talking about trunk at this point (not 2.3), because  
> only trunk distinguishes commit() vs flush(): there's no guarantee  
> exactly when IW does a commit() when autoCommit is true.  Also,  
> autoCommit is deprecated, meaning in 3.0 it will be hardwired to  
> false, so your application must commit() or close() when it's necessary.
> 
>> STAGE 3 :
>> Add x-10 docs one after other                                    
>> Total docs
>> in memory = x-10              (5)
> 
> Actually x-7 in memory now.
> 
>> NOW ... I call deleteDocuments(Term term), which has potential  
>> matches at
>> two places :
>> a) x-15 (out of x-10) documents currently residing in memory.
>> b) x-20 (out of x) documents currently in the index directory.
>> (6)
>>
>> IndexWriter.close() is called
>> (7)
>>
>>
>> Now, my question is, will the index contain
>> (i) a total of (x) + (x-10) - (x-15) documents
>> (ii) a total of (x) + (x-10) - (x-20) documents
>> (iii) a total of (x) + (x-10) - (x-15) - (x-20) documents
> 
> Index will contain (iii), corrected to (x) + (x-7) - (x-15) - (x-20).   
> Ie the deletion always applies to any documents, flushed or buffered  
> in RAM.  The deletion is fully independent of what buffering IW is  
> doing.
> 
>> Secondly, will the answer change had I opened the IndexWriter in  
>> autocommit
>> = false mode?
> 
> No, you get the same result.  autoCommit simply affects *when* the  
> changes become visible/durable to an external reader, not what changes  
> occur.  Any series of changes using an IW will produce the same final  
> result regardless of autoCommit, assuming we're not talking about JRE/ 
> machine's crashing, etc.
> 
>> Several other permutations of (autocommit mode), (points of flush  
>> call),
>> (points of close call) exist, but I guess I will be fine if I get  
>> the answer
>> to the first question itself. A little explanation will be highly  
>> obliged.
> 
> autoCommit, points of flushing, points of committing, points of  
> merging, etc, should be fully independent of what changes (add/ 
> deletes) you are doing.
> 
> Mike
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Query-in-IndexWriter.deleteDocuments%28Term-term%29-tp18662995p18665652.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message