lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: [jira] Commented: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge
Date Fri, 23 Mar 2007 20:09:26 GMT
Yeah, I didn't play yet with millions of documents.  We will need a  
bigger test collection, I think!  Although the benchmarker can add as  
many as you want from the same source, index compression will effect  
the results possibly more than a bigger collection with all unique docs.

Maybe it is time to look at adding Wikipedia as a test collection.  I  
think there are something like 18+ million docs in it.

On Mar 23, 2007, at 4:01 PM, Doug Cutting wrote:

> Michael McCandless wrote:
>> Also, one caveat: whenever #docs (21578 for Reuters) divided by
>> maxBuffered docs is less than mergeFactor, you will have no merges
>> take place during your runs.  This greatly skews the results.
>
> Also, my guess is that this index fits entirely in the buffer  
> cache. Things behave quite differently when segments are larger  
> than available memory and merging requires lots of disk i/o.
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message