Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 70371 invoked from network); 18 Aug 2004 21:02:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 18 Aug 2004 21:02:13 -0000 Received: (qmail 15442 invoked by uid 500); 18 Aug 2004 21:02:03 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 15410 invoked by uid 500); 18 Aug 2004 21:02:03 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 15395 invoked by uid 99); 18 Aug 2004 21:02:02 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [207.162.57.103] (HELO mere.cirano.qc.ca) (207.162.57.103) by apache.org (qpsmtpd/0.27.1) with ESMTP; Wed, 18 Aug 2004 14:01:59 -0700 Received: from mere.cirano.qc.ca (localhost.localdomain [127.0.0.1]) by mere.cirano.qc.ca (8.12.8/8.12.8) with ESMTP id i7IL1tZH026561 for ; Wed, 18 Aug 2004 17:01:55 -0400 Received: from localhost (vauchers@localhost) by mere.cirano.qc.ca (8.12.8/8.12.8/Submit) with ESMTP id i7IL1tEw026557 for ; Wed, 18 Aug 2004 17:01:55 -0400 X-Authentication-Warning: mere.cirano.qc.ca: vauchers owned process doing -bs Date: Wed, 18 Aug 2004 17:01:55 -0400 (EDT) From: Stephane James Vaucher To: Lucene Users List Subject: Re: Index Size In-Reply-To: <049a01c48564$2d3b47a0$6204100a@INFXINC.NET> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N From: Doug Cutting http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg08757.html > An index typically requires around 35% of the plain text size. I think it's a little big. sv On Wed, 18 Aug 2004, Rob Jose wrote: > Hello > I have indexed several thousand (52 to be exact) text files and I keep > running out of disk space to store the indexes. The size of the > documents I have indexed is around 2.5 GB. The size of the Lucene > indexes is around 287 GB. Does this seem correct? I am not storing the > contents of the file, just indexing and tokenizing. I am using Lucene > 1.3 final. Can you guys let me know what you are experiencing? I don't > want to go into production with something that I should be configuring > better. > > I am not sure if this helps, but I have a temp index and a real index. I index the file into the temp index, and then merge the temp index into the real index using the addIndexes method on the IndexWriter. I have also set the production writer setUseCompoundFile to true. I did not set this on the temp index. The last thing that I do before closing the production writer is to call the optimize method. > > I would really appreciate any ideas to get the index size smaller if it is at all possible. > > Thanks > Rob --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org