Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DCD1ED48B for ; Fri, 7 Sep 2012 16:43:43 +0000 (UTC) Received: (qmail 6341 invoked by uid 500); 7 Sep 2012 16:43:40 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 6294 invoked by uid 500); 7 Sep 2012 16:43:40 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 6286 invoked by uid 99); 7 Sep 2012 16:43:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Sep 2012 16:43:40 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.216.41] (HELO mail-qa0-f41.google.com) (209.85.216.41) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Sep 2012 16:43:35 +0000 Received: by qafk30 with SMTP id k30so7349405qaf.14 for ; Fri, 07 Sep 2012 09:43:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type :x-gm-message-state; bh=hOfiLyoi4jlba2QpG/tQn83kOHjWqoRH2tAbLz/N8aQ=; b=o1OuZqYgfhYa1G5hzV445NgXq/cirO5XafvOKs4Zkyj7GT+iKdkyVLsIrP5hlzRBDH D1q4L6ekgngVO6lPYq3P/hAzwsETUqOAryLkv3TUKcyy/kLwqrF7HH0/NPg9Fan8xEKo 5p5XjO6b3scLisu8UjuDToTrQ+JF69gNLDVoIULLu5k9eMHq4lDBuzKfYDrz934a47R7 YIEzRx/M1EFS13e49sqpv2BQNEa5gnUPs8OfVaBCiHxJBsVr8PW2F37F+f8fUADdPk6O V/raovkTHkx51zv+SkHRmWqLWt0MesrqFVkfXUR6mSQuK6xeWVlTs5RIRKGthj3WZ+eL ofZg== MIME-Version: 1.0 Received: by 10.224.18.209 with SMTP id x17mr8436509qaa.15.1347036193735; Fri, 07 Sep 2012 09:43:13 -0700 (PDT) Received: by 10.224.174.7 with HTTP; Fri, 7 Sep 2012 09:43:13 -0700 (PDT) Date: Fri, 7 Sep 2012 12:43:13 -0400 Message-ID: Subject: Solr 4.0 Beta, termIndexInterval vs termIndexDivisor vs termInfosIndexDivisor From: Tom Burton-West To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=bcaec51dd7c5d3c2d104c91f4d20 X-Gm-Message-State: ALoCoQnYljmsoJm05cxFjYODEsJ/q/5E1RaLzCxB2xocZ6V9hRNGmB+be8KuTxqfVnMAXwYFOVtl X-Virus-Checked: Checked by ClamAV on apache.org --bcaec51dd7c5d3c2d104c91f4d20 Content-Type: text/plain; charset=ISO-8859-1 Hello all, Due to multiple languages and dirty OCR, our indexes have over 2 billion unique terms ( http://www.hathitrust.org/blogs/large-scale-search/too-many-words-again). In Solr 3.6 and previous we needed to reduce the memory used for storing the in-memory representation of the tii file. We originally used the termInfosIndexDivisor which affects the sampling of the tii file when read into memory. While this solved our problem for searching, unfortunately the termInfosIndexDivisor was not read during indexing and caused OOM problems once our indexes grew beyond a certain size. See: https://issues.apache.org/jira/browse/SOLR-2290. Has this been changed in Solr 4.0? The advantage of using the termInfosIndexDivisor is that it can be changed without re-indexing, so we were able to experiment with different settings to determine a good setting without re-indexing several terabytes of data. When we ran into problems with the memory use for the in-memory representation of the tii file during indexing, we changed the termIndexInterval. The termIndexInterval is an indexing-time setting which affects the size of the tii file. It sets the sampling of the tis file that gets written to the tii file. In Solr 4.0 termInfosIndexDivisor has been replaced with termIndexDivisor. The documentation for these two features, the index-time termIndexInterval and the run-time termIndexDivisor no longer seems to be on the solr config page of the wiki and the docmentation in the example file does not exlain what the termIndexDivisor does. Would it be appropriate to add these back to the wiki page? If not, could someone add a line or two to the comments in the Solr 4.0 example file explaining what the termIndexDivisor doe? Tom --bcaec51dd7c5d3c2d104c91f4d20--