lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Burton-West <tburt...@umich.edu>
Subject Details on setting block parameters for Lucene41PostingsFormat
Date Fri, 09 Jan 2015 21:15:07 GMT
Hello all,

We have over 3 billion unique terms in our indexes and with Solr 3.x we set
the TermIndexInterval to about 8 times its default value in order to index
without OOMs.  (
http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again)

We are now working with Solr 4 and running into memory issues and are
wondering if we need to do something analogous for Solr 4.

The javadoc for IndexWriterConfig (
http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/index/IndexWriterConfig.html#setTermIndexInterval%28int%29
)
indicates that the lucene 4.1 postings format has some parameters which may
be set:
"..To configure its parameters (the minimum and maximum size for a block),
you would instead use Lucene41PostingsFormat.Lucene41PostingsFormat(int,
int)
<https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat.html#Lucene41PostingsFormat%28int,%20int%29>
"

Is there documentation or discussion somewhere about how to determine
appropriate parameters or some detail about what setting the maxBlockSize
and minBlockSize does?

Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message