lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Query in IndexWriter.deleteDocuments(Term term)
Date Sat, 26 Jul 2008 09:41:26 GMT

java_is_everything wrote:

>
> Hi all.
>
> This may seem a longish and informal mail, but do correct me if my
> assumptions are wrong anywhere, otherwise my actual doubt will make no
> sense.
>
> Say I opened an IndexWriter on an initially empty directory, using
> autocommit = true. Now, what I do is add and delete documents  
> randomly. I
> set "x" as maxBufferedDocs and "y" as maxBufferedDeleteTerms (x < y).
>
> IndexWritrer starts its work. Now, I perfom the following sequences :
>
> STAGE 1 :
> Add "x-2" documents one after the another.                Total docs  
> in
> memory = x-2                (1)
> Delete 3 docs from memory                                        
> Total docs
> in memory = x-5               (2)
> Add 5 docs one after another                                     
> Total docs
> in memory = x                   (3)
>
> STAGE 2 :
> A flush happens, sice maxBufferedDocs reached.           Total docs in
> memory = 0                  (4)

One correction here: the added doc count that triggers a flush does  
*not* measure deletions.  So, in your step (3) above, after having  
added 2 of the 5 docs, IW will flush.  Then it has 3 added docs  
buffered in RAM.

> Thus, it is also a commit.

Assuming you're talking about trunk at this point (not 2.3), because  
only trunk distinguishes commit() vs flush(): there's no guarantee  
exactly when IW does a commit() when autoCommit is true.  Also,  
autoCommit is deprecated, meaning in 3.0 it will be hardwired to  
false, so your application must commit() or close() when it's necessary.

> STAGE 3 :
> Add x-10 docs one after other                                    
> Total docs
> in memory = x-10              (5)

Actually x-7 in memory now.

> NOW ... I call deleteDocuments(Term term), which has potential  
> matches at
> two places :
> a) x-15 (out of x-10) documents currently residing in memory.
> b) x-20 (out of x) documents currently in the index directory.
> (6)
>
> IndexWriter.close() is called
> (7)
>
>
> Now, my question is, will the index contain
> (i) a total of (x) + (x-10) - (x-15) documents
> (ii) a total of (x) + (x-10) - (x-20) documents
> (iii) a total of (x) + (x-10) - (x-15) - (x-20) documents

Index will contain (iii), corrected to (x) + (x-7) - (x-15) - (x-20).   
Ie the deletion always applies to any documents, flushed or buffered  
in RAM.  The deletion is fully independent of what buffering IW is  
doing.

> Secondly, will the answer change had I opened the IndexWriter in  
> autocommit
> = false mode?

No, you get the same result.  autoCommit simply affects *when* the  
changes become visible/durable to an external reader, not what changes  
occur.  Any series of changes using an IW will produce the same final  
result regardless of autoCommit, assuming we're not talking about JRE/ 
machine's crashing, etc.

> Several other permutations of (autocommit mode), (points of flush  
> call),
> (points of close call) exist, but I guess I will be fine if I get  
> the answer
> to the first question itself. A little explanation will be highly  
> obliged.

autoCommit, points of flushing, points of committing, points of  
merging, etc, should be fully independent of what changes (add/ 
deletes) you are doing.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message