Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (asf.osuosl.org: local policy)
Message-ID: <437298F1.40205@nuix.com.au>
Date: Thu, 10 Nov 2005 11:48:49 +1100
From: Daniel Noll <daniel@nuix.com.au>
Organization: NUIX Pty Limited
User-Agent: Mozilla Thunderbird 1.0.7 (Windows/20050923)
MIME-Version: 1.0
To: Lucene Mailing List <java-user@lucene.apache.org>
Subject: Memory Usage
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hi.

What is the expected memory usage of Lucene these days?  I dug up an old 
email [1] from 2001 which gave the following summary of memory usage:

An IndexReader requires:
  one byte per field per document in index (norms)
  one open file per file in index
  1/128 of the Terms in the index
    a Term has two pointers (8 bytes)
     and a String (4 pointers = 24 bytes, one to 16-bit chars)

 From this, we determined the norms to be by far the biggest problem, 
and set about removing them based on a patch submitted on the issue 
tracker [2].

However, now we've met the next hurdle: the terms use much more memory 
than suggested above.

Profiling a text index with roughly 32,000,000 terms, we have about:
   * 13MB of char[]
   * 6MB of java.lang.String
   * 6MB of org.apache.lucene.index.Term
   * 8MB of org.apache.lucene.index.TermInfo
   => Total = 33MB

This actually equates to about:
   * 52 bytes (average, depends on the term lengths in the index) per char[]
   * 24 bytes per String
   * 24 bytes per Term
   * 32 bytes per TermInfo
   => 132 bytes per term, for the 1 in 128 terms which are held.

This isn't a problem in the current state, but when loading 30 of these 
text indexes at once, we start running into serious memory usage issues.

My question is: is this 1/128 figure set in stone, or can it be changed 
without major consequences?

I would rather have an application which used less memory and took 
longer, than one which uses all the available RAM just to milk out a bit 
of extra speed.

Daniel


References:
[1]: http://rubyurl.com/uF1
[2]: http://issues.apache.org/jira/browse/LUCENE-448

-- 
Daniel Noll

NUIX Pty Ltd
Level 8, 143 York Street, Sydney 2000
Phone: (02) 9283 9010
Fax:   (02) 9283 9020

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org