lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: how to estimate how much memory is required to support the large index search
Date Tue, 18 Nov 2008 10:38:26 GMT

BTW, upcoming changes in Lucene for flexible indexing should improve  
the RAM usage of the terms index substantially:

     https://issues.apache.org/jira/browse/LUCENE-1458

In the current (first) iteration on that patch, TermInfo is no longer  
used at all when loading the index.  I think for a typical index this  
will likely cut in half the RAM used by the terms index.

But... this won't be available for some time (it's still a work in  
progress).

Mike

Chris Lu wrote:

> So looks like you are not really doing much sorting? This index  
> divisor
> affects reader.terms(), but not too much with sorting.
>
> -- 
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per  
> request) got
> 2.6 Million Euro funding!
>
>
> On Mon, Nov 17, 2008 at 6:21 PM, Zhibin Mai <zbmai@yahoo.com> wrote:
>
>> It is a cache tunning setting in IndexReader. It can be set via  
>> method
>> setTermInfosIndexDivisor(int).
>>
>> Thanks,
>>
>> Zhibin
>>
>>
>>
>>
>> ________________________________
>> From: Chris Lu <chris.lu@gmail.com>
>> To: java-user@lucene.apache.org
>> Sent: Monday, November 17, 2008 7:07:21 PM
>> Subject: Re: how to estimate how much memory is required to support  
>> the
>> large index search
>>
>> Calculation looks right. But what's the "Index divisor" that you  
>> mentioned?
>>
>> --
>> Chris Lu
>> -------------------------
>> Instant Scalable Full-Text Search On Any Database/Application
>> site: http://www.dbsight.net
>> demo: http://search.dbsight.com
>> Lucene Database Search in 3 minutes:
>>
>> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>> DBSight customer, a shopping comparison site, (anonymous per  
>> request) got
>> 2.6 Million Euro funding!
>>
>> On Mon, Nov 17, 2008 at 5:00 PM, Zhibin Mai <zbmai@yahoo.com> wrote:
>>
>>> Aleksander,
>>>
>>> I figured it out that most of heap was consumed by the Term cache.  
>>> In our
>>> case, the index has 233 millions of terms and 6.4 millions of them  
>>> were
>>> loaded into the cache when we did the search. I roughly did a  
>>> calculation
>>> that each term will need how much memory, it is about
>>> 16 bytes for Term Object + 32 bytes for TermInfo Object + 24 bytes  
>>> for
>>> String Object for term text + 2 * length(Char[]) for term text.
>>>
>>> In our case, the average length of term text is 25 characters,  
>>> that means
>>> each term needs at least 122 bytes. The cache for 6.4 millions of  
>>> terms
>>> needs 6.4 * 122 = 780MB. Plus 200MB for caching norm, the RAM for  
>>> cache
>> is
>>> larger than 980MB. We work around the cache issue for Terms by  
>>> setting
>> index
>>> divisor of the IndexReader to a higher value. Actually, the  
>>> performance
>> of
>>> search is good even using index divisor as 4.
>>>
>>> Thanks,
>>>
>>> Zhibin
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Aleksander M. Stensby <aleksander.stensby@integrasco.no>
>>> To: java-user@lucene.apache.org
>>> Sent: Monday, November 17, 2008 2:31:04 AM
>>> Subject: Re: how to estimate how much memory is required to  
>>> support the
>>> large index search
>>>
>>> One major factor that may result in heap space problems is if you  
>>> are
>> doing
>>> any form of sorting when searching. Do you have any form of  
>>> default sort
>> in
>>> your application? Also, the type of field used for sorting is  
>>> important
>> with
>>> regard to memory consumption.
>>>
>>> This issue has been discussed before on the list. (You can search  
>>> the
>>> archive for sorting and memory consumption.)
>>>
>>> - Aleksander
>>>
>>> On Sun, 16 Nov 2008 14:36:36 +0100, Zhibin Mai <zbmai@yahoo.com>  
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I
>>>> am a beginner on using lucene. We developed an application to
>>>> create and search index using lucene 2.3.1. We would like to know  
>>>> how
>>>> to estimate how much memory is required to support
>>>> the index search given an index.
>>>>
>>>> Recently,
>>>> the size of the index has reached to about 200GB with 197M of  
>>>> documents
>>>> and 223M of terms. Our application starts having intermittent
>>>> "OutOfMemoryError: Java heap space" when we use
>>>> it to search the index. We use JProfiler to get the following  
>>>> memory
>>> allocation when we do one keyword search:
>>>>
>>>> char[]                                                        332MB
>>>> org.apache.lucene.index.TermInfo            194MB
>>>> java.lang.String                                        146MB
>>>> org.apache.lucene.index.Term                99,823KB
>>>> org.apache.lucene.index.Term                24,956KB
>>>> org.apache.lucene.index.TermInfo[]        24,956KB
>>>>
>>>> byte[]                                                    188MB
>>>> long[]                                                    49,912KB
>>>>
>>>> The memory allocation for the first 6 types of objects does not  
>>>> change
>>> when we change the search criteria. Could you please give me some  
>>> advice
>>> what major factors will affect the memory allocation
>>>> and how those factors will affect the memory usage precisely on  
>>>> search?
>>> Is it possible to reduce the memory usage on search?
>>>>
>>>>
>>>> Thank you,
>>>>
>>>>
>>>> Zhibin
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --Aleksander M. Stensby
>>> Senior software developer
>>> Integrasco A/S
>>> www.integrasco.no
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message