lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <>
Subject Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
Date Fri, 23 Mar 2007 16:00:41 GMT

"Yonik Seeley" <> wrote:
> On 3/22/07, Michael McCandless <> wrote:
> > Merging is costly because you read all data in then write all data
> > out, so, you want to minimize for byte of data in the index in the
> > index how many times it will be "serviced" (read in, written out) as
> > part of a merge.
> Avoiding the re-writing of stored fields might be nice:

That's exactly the approach I'm taking in LUCENE-843: stored fields and term
vectors are immediately written to disk.  Only frq, prx and tis use up
memory.  This greatly extends how many docs you can buffer before
having to flush (assuming your docs have stored fields and term

When memory is full, I either flush a segment to disk (when writer is
in autoCommit=true mode), else I flush the data to tmp files which are
finally merged into a segment when the writer is closed.  This merging
is less costly because the bytes in/out are just frq, prx and tis, so
this improves performance of autoCommit=false mode vs autoCommit=true

But, this is only for the segment created from buffered docs (ie the
segment created by a "flush").  Subsequent merges still must copy
bytes in/out and in LUCENE-843 I haven't changed anything about how
segments are merged.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message