Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2589BC7E9 for ; Sun, 11 Jan 2015 00:59:46 +0000 (UTC) Received: (qmail 75091 invoked by uid 500); 11 Jan 2015 00:59:40 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 75024 invoked by uid 500); 11 Jan 2015 00:59:40 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 75012 invoked by uid 99); 11 Jan 2015 00:59:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Jan 2015 00:59:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [209.85.212.182] (HELO mail-wi0-f182.google.com) (209.85.212.182) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Jan 2015 00:59:15 +0000 Received: by mail-wi0-f182.google.com with SMTP id h11so8680939wiw.3 for ; Sat, 10 Jan 2015 16:58:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=vXdvNKg2ztGMuBtsxgKKsMIzRYvPe+Ux3gQidGY851o=; b=UXEawdtMX6G8u9LDfBQjsFHx+VBLkTOQgC/xWTGWal3VZPF+H7UBMTI3nWy1UrkKQP x5FnkhWSLLrnZ3UFJLCQzOqYqd35pPTaGqulEbeKqjyphEXnmO26TLcNm2PAVSD70qAM Co1Cw3nn/BytLu0r3dURgimm5jGRADUbiKpWbGDep2bzh0ijG7lKGm+mk4ca8YQa0FOJ OcjEH/T5C4NkAe65mp8Y2VABoLbPK7VRHj7x5L3nIvK+zYUTY8vL5H03cf391FZD1LtM +JFzRuWGobSR97jZRwP/ruZHa7du2jqsZHY1eDIDGYmvse0/MBClHIceCG21B9RyPkRt GT6Q== X-Gm-Message-State: ALoCoQmnvTf46+LdNdRuaG9P5ACzEA/G4u6DJVTE+/gKsn5xkH5xfMf03+wIj4NH6gSRFnJgk5xT MIME-Version: 1.0 X-Received: by 10.180.8.71 with SMTP id p7mr17652334wia.17.1420937932975; Sat, 10 Jan 2015 16:58:52 -0800 (PST) Received: by 10.194.235.135 with HTTP; Sat, 10 Jan 2015 16:58:52 -0800 (PST) In-Reply-To: References: Date: Sat, 10 Jan 2015 19:58:52 -0500 Message-ID: Subject: Re: Details on setting block parameters for Lucene41PostingsFormat From: Tom Burton-West To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=f46d0442832abe474c050c55e438 X-Virus-Checked: Checked by ClamAV on apache.org --f46d0442832abe474c050c55e438 Content-Type: text/plain; charset=UTF-8 Thanks Mike, We run our Solr 3.x indexing with 10GB/shard. I've been testing Solr 4 with 4,6, and 8GB for heap. As of Friday night when the indexes were about half done (about 400GB on disk) only the 4GB had issues. I'll find out on Monday if the other runs had issues. If we can go from 10GB in Solr 3.x to 6GB with Solr 4.x, that will be a significant change. With TermsIndexInterval we traded off less memory use for increased chance of disk seeks and more data to be read per seek (and if I remember right, that more data was scanned sequentially rather than binary searched.) What is the trade-off when increasing the block size? Tom On Sat, Jan 10, 2015 at 4:46 AM, Michael McCandless < lucene@mikemccandless.com> wrote: > The first int to Lucene41PostingsFormat is the min block size (default > 25) and the second is the max (default 48) for the block tree terms > dict. > > The max must be >= 2*(min-1). > > Since you were using 8X the default before, maybe try min=200 and > max=398? However, block tree should have been more RAM efficient than > 3.x's terms index... if you run CheckIndex with -verbose it will print > additional details about the block structure of your terms indices... > > Mike McCandless > > http://blog.mikemccandless.com > > > On Fri, Jan 9, 2015 at 4:15 PM, Tom Burton-West > wrote: > > Hello all, > > > > We have over 3 billion unique terms in our indexes and with Solr 3.x we > set > > the TermIndexInterval to about 8 times its default value in order to > index > > without OOMs. ( > > http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again) > > > > We are now working with Solr 4 and running into memory issues and are > > wondering if we need to do something analogous for Solr 4. > > > > The javadoc for IndexWriterConfig ( > > > http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/index/IndexWriterConfig.html#setTermIndexInterval%28int%29 > > ) > > indicates that the lucene 4.1 postings format has some parameters which > may > > be set: > > "..To configure its parameters (the minimum and maximum size for a > block), > > you would instead use Lucene41PostingsFormat.Lucene41PostingsFormat(int, > > int) > > < > https://lucene.apache.org/core/4_10_2/core/org/apache/lucene/codecs/lucene41/Lucene41PostingsFormat.html#Lucene41PostingsFormat%28int,%20int%29 > > > > " > > > > Is there documentation or discussion somewhere about how to determine > > appropriate parameters or some detail about what setting the maxBlockSize > > and minBlockSize does? > > > > Tom Burton-West > > http://www.hathitrust.org/blogs/large-scale-search > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --f46d0442832abe474c050c55e438--