lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: [jira] Commented: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge
Date Fri, 23 Mar 2007 20:09:26 GMT
Yeah, I didn't play yet with millions of documents.  We will need a  
bigger test collection, I think!  Although the benchmarker can add as  
many as you want from the same source, index compression will effect  
the results possibly more than a bigger collection with all unique docs.

Maybe it is time to look at adding Wikipedia as a test collection.  I  
think there are something like 18+ million docs in it.

On Mar 23, 2007, at 4:01 PM, Doug Cutting wrote:

> Michael McCandless wrote:
>> Also, one caveat: whenever #docs (21578 for Reuters) divided by
>> maxBuffered docs is less than mergeFactor, you will have no merges
>> take place during your runs.  This greatly skews the results.
> Also, my guess is that this index fits entirely in the buffer  
> cache. Things behave quite differently when segments are larger  
> than available memory and merging requires lots of disk i/o.
> Doug
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll
Center for Natural Language Processing

Read the Lucene Java FAQ at 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message