lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Burton-West <tburt...@umich.edu>
Subject Re: Details on setting block parameters for Lucene41PostingsFormat
Date Sun, 11 Jan 2015 00:58:52 GMT
Thanks Mike,

We run our Solr 3.x indexing with 10GB/shard.  I've been testing Solr 4
with 4,6, and 8GB for heap.  As of Friday night when the indexes were about
half done (about 400GB on disk) only the 4GB had issues.  I'll find out on
Monday if the other runs had issues.  If we can go from 10GB in Solr 3.x to
6GB with Solr 4.x, that will be a significant change.

With TermsIndexInterval we traded off less memory use for increased chance
of disk seeks and more data to be read per seek (and if I remember right,
that more data was scanned sequentially rather than binary searched.)
What is the trade-off when increasing the block size?

Tom

On Sat, Jan 10, 2015 at 4:46 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> The first int to Lucene41PostingsFormat is the min block size (default
> 25) and the second is the max (default 48) for the block tree terms
> dict.
>
> The max must be >= 2*(min-1).
>
> Since you were using 8X the default before, maybe try min=200 and
> max=398?  However, block tree should have been more RAM efficient than
> 3.x's terms index... if you run CheckIndex with -verbose it will print
> additional details about the block structure of your terms indices...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Jan 9, 2015 at 4:15 PM, Tom Burton-West <tburtonw@umich.edu>
> wrote:
> > Hello all,
> >
> > We have over 3 billion unique terms in our indexes and with Solr 3.x we
> set
> > the TermIndexInterval to about 8 times its default value in order to
> index
> > without OOMs.  (
> > http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again)
> >
> > We are now working with Solr 4 and running into memory issues and are
> > wondering if we need to do something analogous for Solr 4.
> >
> > The javadoc for IndexWriterConfig (
> >
> http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/index/IndexWriterConfig.html#setTermIndexInterval%28int%29
> > )
> > indicates that the lucene 4.1 postings format has some parameters which
> may
> > be set:
> > "..To configure its parameters (the minimum and maximum size for a
> block),
> > you would instead use Lucene41PostingsFormat.Lucene41PostingsFormat(int,
> > int)
> > <
> https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat.html#Lucene41PostingsFormat%28int,%20int%29
> >
> > "
> >
> > Is there documentation or discussion somewhere about how to determine
> > appropriate parameters or some detail about what setting the maxBlockSize
> > and minBlockSize does?
> >
> > Tom Burton-West
> > http://www.hathitrust.org/blogs/large-scale-search
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message