lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents
Date Fri, 23 Mar 2007 19:19:45 GMT
I've only been loosely following this...

Do you think it is possible to separate the stored/term vector  
handling into a separate patch against the current trunk?  This seems  
like a quick win and I know it has been speculated about before.

On Mar 23, 2007, at 12:00 PM, Michael McCandless wrote:

>
> "Yonik Seeley" <yonik@apache.org> wrote:
>> On 3/22/07, Michael McCandless <lucene@mikemccandless.com> wrote:
>>> Merging is costly because you read all data in then write all data
>>> out, so, you want to minimize for byte of data in the index in the
>>> index how many times it will be "serviced" (read in, written out) as
>>> part of a merge.
>>
>> Avoiding the re-writing of stored fields might be nice:
>> http://www.nabble.com/Re%3A--jira--Commented%3A-%28LUCENE-565%29- 
>> Supporting-deleteDocuments-in-IndexWriter-%28Code-and-Performance- 
>> Results-Provided%29-p6177280.html
>
> That's exactly the approach I'm taking in LUCENE-843: stored fields  
> and term
> vectors are immediately written to disk.  Only frq, prx and tis use up
> memory.  This greatly extends how many docs you can buffer before
> having to flush (assuming your docs have stored fields and term
> vectors).
>
> When memory is full, I either flush a segment to disk (when writer is
> in autoCommit=true mode), else I flush the data to tmp files which are
> finally merged into a segment when the writer is closed.  This merging
> is less costly because the bytes in/out are just frq, prx and tis, so
> this improves performance of autoCommit=false mode vs autoCommit=true
> mode.
>
> But, this is only for the segment created from buffered docs (ie the
> segment created by a "flush").  Subsequent merges still must copy
> bytes in/out and in LUCENE-843 I haven't changed anything about how
> segments are merged.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message