lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum <ansh...@gmail.com>
Subject Re: Wanting batch update to avoid high disk usage
Date Tue, 24 Aug 2010 04:10:56 GMT
Hi Justin,
Lucene does not reclaim space, each update translates to a deletion followed
by an addition of a new document. Ideally you could let the index size bloat
and then expungeDeletes or optimize at the end, unless you want to clean up
and reclaim the disc space for some reason. Optimize/expungeDeletes() is
when lucene would reclaim deleted document ids and document space. Though
remember that both these functions are highly I/O intensive and time
consuming at times. Just see if reclaiming space midway is something so
essential.

Moreover, a commit would not impact your issue of reclaiming lost disc
space.

--
Anshum Gupta
http://ai-cafe.blogspot.com


On Tue, Aug 24, 2010 at 9:22 AM, Justin <crynax@yahoo.com> wrote:

> My actual code did not call expungeDeletes every time through the loop;
> however,
> calling expungeDeletes or optimize after the loop means that the index has
> doubled in size with all the deleted documents still sitting around. Or is
> it
> true that Lucene will try to reclaim disk space? I assume a commit would be
> required at some point.
>
>
>
>
> ----- Original Message ----
> From: Anshum <anshumg@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Mon, August 23, 2010 10:18:36 PM
> Subject: Re: Wanting batch update to avoid high disk usage
>
> Don't bother calling expunge deletes so often, makes no sense. Instead,
> call
> it once at the end, though, you are calling the optimize method in the end
> anyways so should take care of itself. there shouldn't be any difference
> (but degradation in performance) on adding a call to expungedeletes().
>
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Tue, Aug 24, 2010 at 4:38 AM, Justin <crynax@yahoo.com> wrote:
>
> > In an attempt to avoid doubling disk usage when adding new fields to all
> > existing documents, I added a call to IndexWriter::expungeDeletes. Then
> my
> > colleague pointed out that Lucene will rewrite the potentially large
> > segment
> > files each time that method is called.
> >
> >
> >  reader = writer.getReader();
> >  for (int i=0; i<n; i++) {
> >    Term idTerm = new Term("id", i);
> >    TermDocs termDocs = reader.termDocs(idTerm);
> >    if (termDocs != null && termDocs.next()) {
> >      Document doc = reader.document(termDocs.doc());
> >      doc.add(myfield, value);
> >      writer.updateDocument(idTerm, doc);
> >      //writer.expungeDeletes(true); // BAD: rewrites segment files each
> > time
> >    }
> >  }
> >  reader.close();
> >  writer.commit();
> >  writer.optimize(true);
> >  writer.close();
> >
> >
> > The following Lucene FAQ response suggests that disk space from deleted
> > documents will be reclaimed. Is this true and is the savings worthwhile
> to
> > update an existing index (followed by optimizing out the deleted
> documents)
> > instead of simply creating a new index?
> >
> >
> >
> http://wiki.apache.org/lucene-java/LuceneFAQ#If_I_decide_not_to_optimize_the_index.2C_when_will_the_deleted_documents_actually_get_deleted.3F
> >F
> >
> >
> > Thanks for your help,
> > Justin
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message