Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 5287 invoked from network); 10 Nov 2005 00:43:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 10 Nov 2005 00:43:17 -0000 Received: (qmail 67780 invoked by uid 500); 10 Nov 2005 00:43:11 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 67760 invoked by uid 500); 10 Nov 2005 00:43:11 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 67749 invoked by uid 99); 10 Nov 2005 00:43:11 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Nov 2005 16:43:11 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [210.50.55.251] (HELO file1.syd.nuix.com.au) (210.50.55.251) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Nov 2005 16:43:02 -0800 Received: from [192.168.222.102] (demo1.syd.nuix.com.au [192.168.222.102]) by file1.syd.nuix.com.au (Postfix) with ESMTP id 88394B7359 for ; Thu, 10 Nov 2005 11:42:49 +1100 (EST) Message-ID: <437298F1.40205@nuix.com.au> Date: Thu, 10 Nov 2005 11:48:49 +1100 From: Daniel Noll Organization: NUIX Pty Limited User-Agent: Mozilla Thunderbird 1.0.7 (Windows/20050923) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Mailing List Subject: Memory Usage Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi. What is the expected memory usage of Lucene these days? I dug up an old email [1] from 2001 which gave the following summary of memory usage: An IndexReader requires: one byte per field per document in index (norms) one open file per file in index 1/128 of the Terms in the index a Term has two pointers (8 bytes) and a String (4 pointers = 24 bytes, one to 16-bit chars) From this, we determined the norms to be by far the biggest problem, and set about removing them based on a patch submitted on the issue tracker [2]. However, now we've met the next hurdle: the terms use much more memory than suggested above. Profiling a text index with roughly 32,000,000 terms, we have about: * 13MB of char[] * 6MB of java.lang.String * 6MB of org.apache.lucene.index.Term * 8MB of org.apache.lucene.index.TermInfo => Total = 33MB This actually equates to about: * 52 bytes (average, depends on the term lengths in the index) per char[] * 24 bytes per String * 24 bytes per Term * 32 bytes per TermInfo => 132 bytes per term, for the 1 in 128 terms which are held. This isn't a problem in the current state, but when loading 30 of these text indexes at once, we start running into serious memory usage issues. My question is: is this 1/128 figure set in stone, or can it be changed without major consequences? I would rather have an application which used less memory and took longer, than one which uses all the available RAM just to milk out a bit of extra speed. Daniel References: [1]: http://rubyurl.com/uF1 [2]: http://issues.apache.org/jira/browse/LUCENE-448 -- Daniel Noll NUIX Pty Ltd Level 8, 143 York Street, Sydney 2000 Phone: (02) 9283 9010 Fax: (02) 9283 9020 This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org