lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin <cry...@yahoo.com>
Subject Wanting batch update to avoid high disk usage
Date Mon, 23 Aug 2010 23:08:23 GMT
In an attempt to avoid doubling disk usage when adding new fields to all 
existing documents, I added a call to IndexWriter::expungeDeletes. Then my 
colleague pointed out that Lucene will rewrite the potentially large segment 
files each time that method is called.


  reader = writer.getReader();
  for (int i=0; i<n; i++) {
    Term idTerm = new Term("id", i);
    TermDocs termDocs = reader.termDocs(idTerm);
    if (termDocs != null && termDocs.next()) {
      Document doc = reader.document(termDocs.doc());
      doc.add(myfield, value);
      writer.updateDocument(idTerm, doc);
      //writer.expungeDeletes(true); // BAD: rewrites segment files each time
    }
  }
  reader.close();
  writer.commit();
  writer.optimize(true);
  writer.close();


The following Lucene FAQ response suggests that disk space from deleted 
documents will be reclaimed. Is this true and is the savings worthwhile to 
update an existing index (followed by optimizing out the deleted documents) 
instead of simply creating a new index?

http://wiki.apache.org/lucene-java/LuceneFAQ#If_I_decide_not_to_optimize_the_index.2C_when_will_the_deleted_documents_actually_get_deleted.3F


Thanks for your help,
Justin


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message