lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lu" <chris...@gmail.com>
Subject Re: how to estimate how much memory is required to support the large index search
Date Tue, 18 Nov 2008 05:13:44 GMT
So looks like you are not really doing much sorting? This index divisor
affects reader.terms(), but not too much with sorting.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!


On Mon, Nov 17, 2008 at 6:21 PM, Zhibin Mai <zbmai@yahoo.com> wrote:

> It is a cache tunning setting in IndexReader. It can be set via method
> setTermInfosIndexDivisor(int).
>
> Thanks,
>
> Zhibin
>
>
>
>
> ________________________________
> From: Chris Lu <chris.lu@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Monday, November 17, 2008 7:07:21 PM
> Subject: Re: how to estimate how much memory is required to support the
> large index search
>
> Calculation looks right. But what's the "Index divisor" that you mentioned?
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
>
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
> On Mon, Nov 17, 2008 at 5:00 PM, Zhibin Mai <zbmai@yahoo.com> wrote:
>
> > Aleksander,
> >
> > I figured it out that most of heap was consumed by the Term cache. In our
> > case, the index has 233 millions of terms and 6.4 millions of them were
> > loaded into the cache when we did the search. I roughly did a calculation
> > that each term will need how much memory, it is about
> > 16 bytes for Term Object + 32 bytes for TermInfo Object + 24 bytes for
> > String Object for term text + 2 * length(Char[]) for term text.
> >
> > In our case, the average length of term text is 25 characters, that means
> > each term needs at least 122 bytes. The cache for 6.4 millions of terms
> > needs 6.4 * 122 = 780MB. Plus 200MB for caching norm, the RAM for cache
> is
> > larger than 980MB. We work around the cache issue for Terms by setting
> index
> > divisor of the IndexReader to a higher value. Actually, the performance
> of
> > search is good even using index divisor as 4.
> >
> > Thanks,
> >
> > Zhibin
> >
> >
> >
> >
> > ________________________________
> > From: Aleksander M. Stensby <aleksander.stensby@integrasco.no>
> > To: java-user@lucene.apache.org
> > Sent: Monday, November 17, 2008 2:31:04 AM
> > Subject: Re: how to estimate how much memory is required to support the
> > large index search
> >
> > One major factor that may result in heap space problems is if you are
> doing
> > any form of sorting when searching. Do you have any form of default sort
> in
> > your application? Also, the type of field used for sorting is important
> with
> > regard to memory consumption.
> >
> > This issue has been discussed before on the list. (You can search the
> > archive for sorting and memory consumption.)
> >
> > - Aleksander
> >
> > On Sun, 16 Nov 2008 14:36:36 +0100, Zhibin Mai <zbmai@yahoo.com> wrote:
> >
> > > Hello,
> > >
> > > I
> > > am a beginner on using lucene. We developed an application to
> > > create and search index using lucene 2.3.1. We would like to know how
> > > to estimate how much memory is required to support
> > > the index search given an index.
> > >
> > > Recently,
> > > the size of the index has reached to about 200GB with 197M of documents
> > > and 223M of terms. Our application starts having intermittent
> > > "OutOfMemoryError: Java heap space" when we use
> > > it to search the index. We use JProfiler to get the following memory
> > allocation when we do one keyword search:
> > >
> > > char[]                                                        332MB
> > > org.apache.lucene.index.TermInfo            194MB
> > > java.lang.String                                        146MB
> > > org.apache.lucene.index.Term                99,823KB
> > > org.apache.lucene.index.Term                24,956KB
> > > org.apache.lucene.index.TermInfo[]        24,956KB
> > >
> > > byte[]                                                    188MB
> > > long[]                                                    49,912KB
> > >
> > > The memory allocation for the first 6 types of objects does not change
> > when we change the search criteria. Could you please give me some advice
> > what major factors will affect the memory allocation
> > > and how those factors will affect the memory usage precisely on search?
> > Is it possible to reduce the memory usage on search?
> > >
> > >
> > > Thank you,
> > >
> > >
> > > Zhibin
> > >
> > >
> > >
> >
> >
> >
> > --Aleksander M. Stensby
> > Senior software developer
> > Integrasco A/S
> > www.integrasco.no
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message