lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Smith <psm...@aconex.com>
Subject Re: Huge number of Term objects in memory gives OutOfMemory error
Date Mon, 17 Mar 2008 22:07:57 GMT
I'll bet the byte[] are the Norm data per field.  If you have a lot of  
fields and do not need the normalization data for every field, I'd  
suggest turning that option off for fields you don't need the  
normalization for scoring.  The calculation I understand is:

1 byte x (# fields with normalization turned on) x (# documents within  
the index)

adds up pretty quickly!

The char[] & String's will be your FieldCache's, probably used for  
sorting.  Do you do any sorting other than by relevance?

cheers,

Paul

On 18/03/2008, at 8:57 AM, <Richard.Bolen@gxs.com> wrote:

> I'm running Lucene 2.3.1 with Java 1.5.0_14 on 64 bit linux.  We  
> have fairly large collections (~1gig collection files, ~1,000,000  
> documents).  When I try to load test our application with 50 users,  
> all doing simple searches via a web interface, we quickly get an  
> OutOfMemory exception.  When I do a jmap dump of the heap, this is  
> what I see:
>
> Size    Count   Class description
> -------------------------------------------------------
> 195818576       4263822 char[]
> 190889608       13259   byte[]
> 172316640       4307916 java.lang.String
> 164813120       4120328 org.apache.lucene.index.TermInfo
> 131823104       4119472 org.apache.lucene.index.Term
> 37729184        604     org.apache.lucene.index.TermInfo[]
> 37729184        604     org.apache.lucene.index.Term[]
>
> So 4 of the top 7 memory consumers are Term related.  We have 2 gig  
> of RAM available on the system but we get OOM errors no matter the  
> java heap settings.  Has anyone seen this issue and know how to  
> solve it?
>
> We do use separate MultiSearcher instances for each search.  (We  
> actually have 2 collections that we search via a MultiSearcher.) We  
> tried using a singleton searcher instance but our collections are  
> constantly being updated and the singleton searcher only gives you  
> results since the searcher was opened.  Creating new searcher  
> objects at search time gives you up to the minute search results.
>
> I've seen some postings referring to an Index Divisor setting which  
> could reduce the Terms in memory, but I have not seen how to set  
> this value for Lucene.
>
> Any help would be greatly appreciated.
>
> Rich

Paul Smith
Core Engineering Manager

Aconex
The easy way to save time and money on your project

696 Bourke Street, Melbourne,
VIC 3000, Australia
Tel: +61 3 9240 0200  Fax: +61 3 9240 0299
Email: psmith@aconex.com  www.aconex.com

This email and any attachments are intended solely for the addressee.  
The contents may be privileged, confidential and/or subject to  
copyright or other applicable law. No confidentiality or privilege is  
lost by an erroneous transmission. If you have received this e-mail in  
error, please let us know by reply e-mail and delete or destroy this  
mail and all copies. If you are not the intended recipient of this  
message you must not disseminate, copy or take any action in reliance  
on it. The sender takes no responsibility for the effect of this  
message upon the recipient's computer system.




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message