lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Benge" <todd.be...@gmail.com>
Subject Re: OutOfMemory Problems Lucene 2.4 / Tomcat
Date Thu, 30 Oct 2008 02:30:00 GMT
Thanks Mark.  I appreciate the help.

I thought our memory may be low but wanted to verify there if there is
any way to control memory usage.  I think we'll likely upgrade the
memory on the machines but that may just delay the inevitable.

Wondering if anyone else has encountered similar issues with indices
if a similar size.  I've been thinking we will need to move to a
clustered solution and have been reading on hadoop, nutch, solr &
terracotta for possibilities such as index sharding.

Has anyone implemented a solution using hadoop or terracotta for a
large scale system?  Just wondering the pro's / con's of the various
approaches.

Thanks,

Todd

On Wed, Oct 29, 2008 at 6:07 PM, Mark Miller <markrmiller@gmail.com> wrote:
> The term, terminfo, indexreader internals stuff is prob on the low end
> compared to the size of your field caches (needed for sorting). If you are
> sorting by String I think the space needed is 32 bits x number of docs + an
> array to hold all of the unique terms. So checking 300 million docs (I know
> you are actually breaking it up smaller than that, but for example) and
> ignoring things like String chars being variable byte lengths and storing
> the length, etc and randomly picking 50000 unique terms at 6 chars per:
>
> 32 bits x 300000000 + 50000 x 6 x 16 bits to MB = 1 144.98138 megabytes
>
> Thats per field your sorting on. If you are sorting on an int field it
> should be closer to 32 bits x num docs - shorts, 32 bits x num docs, etc.
>
> So you have those field caches, plus the IndexReader terminfo, term stuff,
> plus whatever RAM your app needs beyond Lucene. 4 gig might just not *quite*
> cut it is my guess.
>
> Todd Benge wrote:
>>
>> There's usually only a couple sort fields and a bunch of terms in the
>> various indices.  The terms are user entered on various media so the
>> number of terms is very large.
>>
>> Thanks for the help.
>>
>> Todd
>>
>>
>>
>> On 10/29/08, Todd Benge <todd.benge@gmail.com> wrote:
>>
>>>
>>> Hi,
>>>
>>> I'm the lead engineer for search on a large website using lucene for
>>> search.
>>>
>>> We're indexing about 300M documents in ~ 100 indices.  The indices add
>>>  up to ~ 60G.
>>>
>>> The indices are sorted into 4 different Multisearcher with the largest
>>> handling ~50G.
>>>
>>> The code is basically like the following:
>>>
>>> private static MultiSearcher searcher;
>>>
>>> public void init(File files) {
>>>
>>>     IndexSearcer [] searchers = new IndexSearcher[files.length] ();
>>>     int i = 0;
>>>     for ( File file: files ) {
>>>          searchers[i++] = new
>>> IndexSearcher(FSDirectory.getDirectory(file);
>>>     }
>>>
>>> searcher = new MultiSearcher(searchers);
>>> }
>>>
>>> public Searcher getSearcher() {
>>>   return searcher;
>>> }
>>>
>>> We're seeing a high cache rate with Term & TermInfo in Lucene 2.4.
>>> Performance is good but servers are consistently hanging with
>>> OutOfMemory errors.
>>>
>>> We're allocating 4G in the heap to each server.
>>>
>>> Is there any way to control the amount of memory Lucene consume for
>>> caching?  Any other suggestions on fixing the memory errors?
>>>
>>> Thanks,
>>>
>>> Todd
>>>
>>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message