lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Richard.Bo...@gxs.com>
Subject RE: Huge number of Term objects in memory gives OutOfMemory error
Date Tue, 18 Mar 2008 11:30:29 GMT

Does each searchable have it's own copy of Term and TermInfo arrays?  So the amount in memory
would grow with each new Searchable instance? If so, it might be worthwhile to implement a
singleton MultiSearcher that is closed and re-opened periodically.  What do you think?

Thanks again,
Rich
________________________________________
From: Michael McCandless [lucene@mikemccandless.com]
Sent: Monday, March 17, 2008 6:27 PM
To: java-user@lucene.apache.org
Subject: Re: Huge number of Term objects in memory gives OutOfMemory error

You can call IndexReader.setTermInfosIndexDivisor(int) to reduce how
many index terms are loaded in memory.  EG setting it to 10 will load
1/10th what's loaded now, but will slow down searches.

Also, you should understand why your index has so many terms.  EG,
use Luke to peek at the terms and see if they are "valid".  If for
example you are accidentally indexing binary content as if it were
text that can easily cause a great many, large, unwanted terms.

Mike

<Richard.Bolen@gxs.com> wrote:

> I'm running Lucene 2.3.1 with Java 1.5.0_14 on 64 bit linux.  We
> have fairly large collections (~1gig collection files, ~1,000,000
> documents).  When I try to load test our application with 50 users,
> all doing simple searches via a web interface, we quickly get an
> OutOfMemory exception.  When I do a jmap dump of the heap, this is
> what I see:
>
> Size    Count   Class description
> -------------------------------------------------------
> 195818576       4263822 char[]
> 190889608       13259   byte[]
> 172316640       4307916 java.lang.String
> 164813120       4120328 org.apache.lucene.index.TermInfo
> 131823104       4119472 org.apache.lucene.index.Term
> 37729184        604     org.apache.lucene.index.TermInfo[]
> 37729184        604     org.apache.lucene.index.Term[]
>
> So 4 of the top 7 memory consumers are Term related.  We have 2 gig
> of RAM available on the system but we get OOM errors no matter the
> java heap settings.  Has anyone seen this issue and know how to
> solve it?
>
> We do use separate MultiSearcher instances for each search.  (We
> actually have 2 collections that we search via a MultiSearcher.) We
> tried using a singleton searcher instance but our collections are
> constantly being updated and the singleton searcher only gives you
> results since the searcher was opened.  Creating new searcher
> objects at search time gives you up to the minute search results.
>
> I've seen some postings referring to an Index Divisor setting which
> could reduce the Terms in memory, but I have not seen how to set
> this value for Lucene.
>
> Any help would be greatly appreciated.
>
> Rich


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message