Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 47502 invoked from network); 1 Oct 2009 16:19:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Oct 2009 16:19:39 -0000 Received: (qmail 26750 invoked by uid 500); 1 Oct 2009 16:19:37 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 26708 invoked by uid 500); 1 Oct 2009 16:19:37 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 26698 invoked by uid 99); 1 Oct 2009 16:19:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Oct 2009 16:19:37 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [141.211.3.202] (HELO itcs-ehub-02.adsroot.itcs.umich.edu) (141.211.3.202) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Oct 2009 16:19:26 +0000 Received: from [141.211.43.195] (141.211.43.195) by itcs-ehub-02.adsroot.itcs.umich.edu (141.211.3.202) with Microsoft SMTP Server (TLS) id 8.2.176.0; Thu, 1 Oct 2009 12:18:44 -0400 Message-ID: <4AC4D673.9000203@umich.edu> Date: Thu, 1 Oct 2009 12:18:59 -0400 From: Phillip Farber User-Agent: Mozilla-Thunderbird 2.0.0.22 (X11/20090706) MIME-Version: 1.0 To: "solr-user@lucene.apache.org" Subject: best way to get the size of an index Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Resuming this discussion in a new thread to focus only on this question: What is the best way to get the size of an index so it does not get too big to be optimized (or to allow a very large segment merge) given space limits? I already have the largest 15,000rpm SCSI direct attached storage so buying storage is not an option. I don't do deletes. From what I've read, I expect no more than a 2x increase during optimization and have not seen more in practice. I'm thinking: stop indexing, commit, do a du. Will this give me the number I need for what I'm trying to do? Is there a better way? Phil