lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Details on setting block parameters for Lucene41PostingsFormat
Date Sat, 10 Jan 2015 09:46:59 GMT
The first int to Lucene41PostingsFormat is the min block size (default
25) and the second is the max (default 48) for the block tree terms
dict.

The max must be >= 2*(min-1).

Since you were using 8X the default before, maybe try min=200 and
max=398?  However, block tree should have been more RAM efficient than
3.x's terms index... if you run CheckIndex with -verbose it will print
additional details about the block structure of your terms indices...

Mike McCandless

http://blog.mikemccandless.com


On Fri, Jan 9, 2015 at 4:15 PM, Tom Burton-West <tburtonw@umich.edu> wrote:
> Hello all,
>
> We have over 3 billion unique terms in our indexes and with Solr 3.x we set
> the TermIndexInterval to about 8 times its default value in order to index
> without OOMs.  (
> http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again)
>
> We are now working with Solr 4 and running into memory issues and are
> wondering if we need to do something analogous for Solr 4.
>
> The javadoc for IndexWriterConfig (
> http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/index/IndexWriterConfig.html#setTermIndexInterval%28int%29
> )
> indicates that the lucene 4.1 postings format has some parameters which may
> be set:
> "..To configure its parameters (the minimum and maximum size for a block),
> you would instead use Lucene41PostingsFormat.Lucene41PostingsFormat(int,
> int)
> <https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat.html#Lucene41PostingsFormat%28int,%20int%29>
> "
>
> Is there documentation or discussion somewhere about how to determine
> appropriate parameters or some detail about what setting the maxBlockSize
> and minBlockSize does?
>
> Tom Burton-West
> http://www.hathitrust.org/blogs/large-scale-search

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message