lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
Date Fri, 23 Mar 2007 19:19:45 GMT
I've only been loosely following this...

Do you think it is possible to separate the stored/term vector  
handling into a separate patch against the current trunk?  This seems  
like a quick win and I know it has been speculated about before.

On Mar 23, 2007, at 12:00 PM, Michael McCandless wrote:

> "Yonik Seeley" <> wrote:
>> On 3/22/07, Michael McCandless <> wrote:
>>> Merging is costly because you read all data in then write all data
>>> out, so, you want to minimize for byte of data in the index in the
>>> index how many times it will be "serviced" (read in, written out) as
>>> part of a merge.
>> Avoiding the re-writing of stored fields might be nice:
>> Supporting-deleteDocuments-in-IndexWriter-%28Code-and-Performance- 
>> Results-Provided%29-p6177280.html
> That's exactly the approach I'm taking in LUCENE-843: stored fields  
> and term
> vectors are immediately written to disk.  Only frq, prx and tis use up
> memory.  This greatly extends how many docs you can buffer before
> having to flush (assuming your docs have stored fields and term
> vectors).
> When memory is full, I either flush a segment to disk (when writer is
> in autoCommit=true mode), else I flush the data to tmp files which are
> finally merged into a segment when the writer is closed.  This merging
> is less costly because the bytes in/out are just frq, prx and tis, so
> this improves performance of autoCommit=false mode vs autoCommit=true
> mode.
> But, this is only for the segment created from buffered docs (ie the
> segment created by a "flush").  Subsequent merges still must copy
> bytes in/out and in LUCENE-843 I haven't changed anything about how
> segments are merged.
> Mike
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll
Center for Natural Language Processing

Read the Lucene Java FAQ at 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message