lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Burton-West <tburt...@umich.edu>
Subject Re: Details on setting block parameters for Lucene41PostingsFormat
Date Mon, 12 Jan 2015 20:44:45 GMT
Thanks Mike,

Do you know how I can configure Solr to use the min=200 and
max=398 block sizes you suggested?  Or should I ask on the Solr list?

Tom

On Sat, Jan 10, 2015 at 4:46 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> The first int to Lucene41PostingsFormat is the min block size (default
> 25) and the second is the max (default 48) for the block tree terms
> dict.
>
> The max must be >= 2*(min-1).
>
> Since you were using 8X the default before, maybe try min=200 and
> max=398?  However, block tree should have been more RAM efficient than
> 3.x's terms index... if you run CheckIndex with -verbose it will print
> additional details about the block structure of your terms indices...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Jan 9, 2015 at 4:15 PM, Tom Burton-West <tburtonw@umich.edu>
> wrote:
> > Hello all,
> >
> > We have over 3 billion unique terms in our indexes and with Solr 3.x we
> set
> > the TermIndexInterval to about 8 times its default value in order to
> index
> > without OOMs.  (
> > http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again)
> >
> > We are now working with Solr 4 and running into memory issues and are
> > wondering if we need to do something analogous for Solr 4.
> >
> > The javadoc for IndexWriterConfig (
> >
> http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/index/IndexWriterConfig.html#setTermIndexInterval%28int%29
> > )
> > indicates that the lucene 4.1 postings format has some parameters which
> may
> > be set:
> > "..To configure its parameters (the minimum and maximum size for a
> block),
> > you would instead use Lucene41PostingsFormat.Lucene41PostingsFormat(int,
> > int)
> > <
> https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat.html#Lucene41PostingsFormat%28int,%20int%29
> >
> > "
> >
> > Is there documentation or discussion somewhere about how to determine
> > appropriate parameters or some detail about what setting the maxBlockSize
> > and minBlockSize does?
> >
> > Tom Burton-West
> > http://www.hathitrust.org/blogs/large-scale-search
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message