lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <gol...@detego-software.de>
Subject Re: PATCH: IndexWriter
Date Wed, 10 Sep 2003 16:37:29 GMT
My patch makes indexWriter.docCount() deliver the same values as
indexReader.maxDoc(). I attach a patch to TestIndexWriter that
demonstrates the difference. The bug in current IndexWriter has the effect
that indexWriter.docCount() never updates any changes that result
from deleting documents.

Christoph


Otis Gospodnetic wrote:
> Christoph,
> 
> The idea looks good, but the test fails for both pre-patched as well as
> patched version of IndexWriter.
> 
> I converted your test to JUnit test and will check it into CVS shortly.
> If I made a mistake in it, please point it out.
> You can run 'ant test-unit' to see where the test fails.
> 
> Otis
> 
> --- Christoph Goller <goller@detego-software.de> wrote:
> 
>>IndexWriter implements the method docCount() which reads the number
>>of documents from the SegmentInfos of the index. However, it delivers
>>incorrect values if documents get deleted from the index. The reason
>>for
>>this is that SegmentInfo.docCounts are updated in an incorrect way
>>when
>>segments get merged. The new value is taken from the old
>>SegmentInfos.
>>It would be better to take the value from the reader instead. In this
>>way indexWriter.docCount() would deliver the same value as
>>indexReader.maxDoc().
>>
>>test and patch are attached,
>>Christoph
>>
>>
>>-- 
>>*****************************************************************
>>* Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
>>* Detego Software GmbH       Mobile: +49 179 1128469            *
>>* Keuslinstr. 13             Fax.:   +49 721 151516176          *
>>* 80798 München, Germany     Email:  goller@detego-software.de  *
>>*****************************************************************
>>
>>>Index: IndexWriter.java
>>
>>===================================================================
>>RCS file:
>>
> 
> /home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/index/IndexWriter.java,v
> 
>>retrieving revision 1.14
>>diff -u -r1.14 IndexWriter.java
>>--- IndexWriter.java	12 Aug 2003 15:05:03 -0000	1.14
>>+++ IndexWriter.java	3 Sep 2003 14:55:33 -0000
>>@@ -355,7 +355,7 @@
>>       if ((reader.directory == this.directory) || // if we own the
>>directory
>>           (reader.directory == this.ramDirectory))
>> 	segmentsToDelete.addElement(reader);	  // queue segment for
>>deletion
>>-      mergedDocCount += si.docCount;
>>+      mergedDocCount += reader.numDocs();
>>     }
>>     if (infoStream != null) {
>>       infoStream.println();
>>
>>>import java.io.IOException;
>>
>>import org.apache.lucene.analysis.WhitespaceAnalyzer;
>>import org.apache.lucene.document.Document;
>>import org.apache.lucene.document.Field;
>>import org.apache.lucene.index.IndexReader;
>>import org.apache.lucene.index.IndexWriter;
>>import org.apache.lucene.store.Directory;
>>import org.apache.lucene.store.RAMDirectory;
>>
>>/*
>> * Created on 03.09.2003
>> *
>> * To change the template for this generated file go to
>> * Window>Preferences>Java>Code Generation>Code and Comments
>> */
>>
>>/**
>> * 
>> * @author goller
>> */
>>public class IndexWriterDocCountTest {
>>    
>>    int docCount = 0;
>>  
>>      void addDoc(IndexWriter writer)
>>      {
>>        Document doc = new Document();
>>    
>>        doc.add(Field.Keyword("id","id" + docCount));
>>        doc.add(Field.UnStored("content","aaa"));
>>    
>>        try {
>>          writer.addDocument(doc);
>>        }
>>        catch (IOException e) {
>>          // TODO Auto-generated catch block
>>          e.printStackTrace();
>>        }
>>        docCount++;
>>      }
>>    
>>    
>>
>>    public static void main(String[] args) {
>>        
>>        Directory dir = new RAMDirectory();
>>        IndexWriterDocCountTest test = new IndexWriterDocCountTest();
>>    
>>        IndexWriter writer = null;
>>        IndexReader reader = null;
>>        int i;
>>    
>>        try {
>>          writer  = new IndexWriter(dir, new WhitespaceAnalyzer(),
>>true);
>>      
>>          for (i = 0; i < 100; i++)
>>            test.addDoc(writer);
>>      
>>          System.out.println("docCount: " + writer.docCount());
>>          writer.close();
>>          
>>          reader = IndexReader.open(dir);
>>          for (i = 0; i < 50; i++)
>>            reader.delete(i);
>>          reader.close();
>>          System.out.println("doc #0-49 deleted");
>>          
>>          writer  = new IndexWriter(dir, new WhitespaceAnalyzer(),
>>false);
>>          System.out.println("docCount: " + writer.docCount());
>>          
>>          writer.optimize();
>>          System.out.println("optimized called");
>>          System.out.println("docCount: " + writer.docCount());
>>          writer.close();
>>          
>>        }
>>        catch (IOException e) {
>>          // TODO Auto-generated catch block
>>          e.printStackTrace();
>>        }
>>    }
>>}
>>
>>
> ---------------------------------------------------------------------
> 
>>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> 
> __________________________________
> Do you Yahoo!?
> Yahoo! SiteBuilder - Free, easy-to-use web site design software
> http://sitebuilder.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 

-- 
*****************************************************************
* Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
* Detego Software GmbH       Mobile: +49 179 1128469            *
* Keuslinstr. 13             Fax.:   +49 721 151516176          *
* 80798 München, Germany     Email:  goller@detego-software.de  *
*****************************************************************


Mime
View raw message