lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lu" <chris...@gmail.com>
Subject Re: how to estimate how much memory is required to support the large index search
Date Tue, 18 Nov 2008 01:07:21 GMT
Calculation looks right. But what's the "Index divisor" that you mentioned?

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Mon, Nov 17, 2008 at 5:00 PM, Zhibin Mai <zbmai@yahoo.com> wrote:

> Aleksander,
>
> I figured it out that most of heap was consumed by the Term cache. In our
> case, the index has 233 millions of terms and 6.4 millions of them were
> loaded into the cache when we did the search. I roughly did a calculation
> that each term will need how much memory, it is about
> 16 bytes for Term Object + 32 bytes for TermInfo Object + 24 bytes for
> String Object for term text + 2 * length(Char[]) for term text.
>
> In our case, the average length of term text is 25 characters, that means
> each term needs at least 122 bytes. The cache for 6.4 millions of terms
> needs 6.4 * 122 = 780MB. Plus 200MB for caching norm, the RAM for cache is
> larger than 980MB. We work around the cache issue for Terms by setting index
> divisor of the IndexReader to a higher value. Actually, the performance of
> search is good even using index divisor as 4.
>
> Thanks,
>
> Zhibin
>
>
>
>
> ________________________________
> From: Aleksander M. Stensby <aleksander.stensby@integrasco.no>
> To: java-user@lucene.apache.org
> Sent: Monday, November 17, 2008 2:31:04 AM
> Subject: Re: how to estimate how much memory is required to support the
> large index search
>
> One major factor that may result in heap space problems is if you are doing
> any form of sorting when searching. Do you have any form of default sort in
> your application? Also, the type of field used for sorting is important with
> regard to memory consumption.
>
> This issue has been discussed before on the list. (You can search the
> archive for sorting and memory consumption.)
>
> - Aleksander
>
> On Sun, 16 Nov 2008 14:36:36 +0100, Zhibin Mai <zbmai@yahoo.com> wrote:
>
> > Hello,
> >
> > I
> > am a beginner on using lucene. We developed an application to
> > create and search index using lucene 2.3.1. We would like to know how
> > to estimate how much memory is required to support
> > the index search given an index.
> >
> > Recently,
> > the size of the index has reached to about 200GB with 197M of documents
> > and 223M of terms. Our application starts having intermittent
> > "OutOfMemoryError: Java heap space" when we use
> > it to search the index. We use JProfiler to get the following memory
> allocation when we do one keyword search:
> >
> > char[]                                                        332MB
> > org.apache.lucene.index.TermInfo            194MB
> > java.lang.String                                        146MB
> > org.apache.lucene.index.Term                99,823KB
> > org.apache.lucene.index.Term                24,956KB
> > org.apache.lucene.index.TermInfo[]        24,956KB
> >
> > byte[]                                                    188MB
> > long[]                                                    49,912KB
> >
> > The memory allocation for the first 6 types of objects does not change
> when we change the search criteria. Could you please give me some advice
> what major factors will affect the memory allocation
> > and how those factors will affect the memory usage precisely on search?
> Is it possible to reduce the memory usage on search?
> >
> >
> > Thank you,
> >
> >
> > Zhibin
> >
> >
> >
>
>
>
> --Aleksander M. Stensby
> Senior software developer
> Integrasco A/S
> www.integrasco.no
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message