lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <ch...@allthingslocal.com>
Subject Seeking advice on index parameter settings for large index
Date Wed, 30 Mar 2005 05:13:53 GMT
I'm preparing to help a company run a scalability test and decide 
whether or not to use Lucene.  Relevant particulars for the test include:
  1.  2 pairs of indices.  Each pair has 1 index with about 7.5 million 
small documents and 1 index with about 1 million large documents.  Each 
index has a substantial number of (small) fields in addition to the 
documents.
  2.  Searching will done using a node for each index pair -- i.e., the 
test will use a MultiSearcher accessing the remote indices.
  3.  Indexing and searching will be done simultaneously -- indexing 
will be incremental and continual.  There are no deletes.
  4.  The platform is Windows
  5.  Both search and indexing time are essential, and so need to be 
balanced.

Based on some early measurements with small test sets, but mostly first 
principles, I'm thinking of using these settings.  The index will take a 
long time to create and I probably get only one chance to prove what 
Lucene can do, and so I'd appreciate any good advice or experience that 
would suggest different settings:

        index.setMaxBufferedDocs(10);  // Buffer 10 documents at a time 
in memory (they could be big)
        index.setMaxFieldLength(Integer.MAX_VALUE);  // We do the 
limiting ourselves by what we pass in
        index.setMaxMergeDocs(100000);  // Yields about 75 large 
segments for 7.5 million docs (plus log2 smaller segments) = 100 total
        index.setMergeFactor(2);  // Faster searches due to fewer 
(small) segements, but slower indexing due to more frequent merging
        index.setSimilarity(similarity);
        index.setTermIndexInterval(128);  // Default.  Larger nubmer 
will reduce memory at cost of slower term access
        index.setUseCompoundFile(true);  // false could improve 
performance but will consume more file handles     

Thanks for any suggestions!

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message