lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: problems with large Lucene index (reason found)
Date Fri, 13 Mar 2009 09:55:43 GMT

Gak, what a horrible bug!

It seems widespread (JRE 1.5, 1.6, on Linux & Windows OSs).  And it's  
been open for almost 2.5 years.  I just added a comment & voted for  
the bug.

Does it also occur on a 64 bit JRE?

If you still allocate the full array, but read several smaller chunks  
into it, do you still hit the bug?

Mike

lucene@digiatlas.org wrote:

>
> I now know the cause of the problem. Increasing heap space actually  
> breaks Lucene when reading large indexes.
>
> Details on why can be found here:
>
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546
>
>
> Lucene is trying to read a huge block (about 2580Mb) at the point of  
> failure. While it allocates the required bytes in method  
> MultiSegmentReader.norms(), line 334, just fine, it is only when it  
> attempts to use this array in a call to RandomAccessFile.readBytes()  
> that it gets OOM. This is caused by a bug in the native code for the  
> Java IO.
>
> As observed in the bug report, large heap space actually causes the  
> bug to appear. When I reduced my heap from 1200M to 1000M the  
> exception was never generated and the code completed correctly and  
> it reported the correct number of search hits in the Hibernate  
> Search version of my program.
>
> This isn't good - I need as much memory as possible because I intend  
> to run my search as a web service.
>
> The work-around would be to read the file in small chunks, but I am  
> not familiar with the Lucene code so I am unsure how that would be  
> done in a global sense (i.e.: does it really need to allocate a  
> buffer of that size in MultiSegmentReader?)
>
> The obvious solution (which I haven't tried yet) would be to patch  
> the point in FSDirectory where the java IO read occurs - looping  
> with a small buffer for the read and then concatenating the result  
> back into Lucene's byte array.
>
> Thanks for the comments on this problem from people on this list.
>
>
>
> Quoting Ted Dunning <ted.dunning@gmail.com>:
>
>> try running with verbose gc.  That will give you more details about  
>> what is
>> happening.
>>
>> Even better, run with jconsole on the side so that you get really  
>> detailed
>> information on memory pools.
>>
>> On Thu, Mar 12, 2009 at 7:30 AM, <lucene@digiatlas.org> wrote:
>>
>>> Thanks Mike and Jokin for your comments on the memory problem. I  
>>> have
>>> submitted the query to the Hibernate Search list although I  
>>> haven't seen a
>>> response yet.
>>>
>>> In the meantime I did my own investigating in the code (I'd rather  
>>> have
>>> avoided this!). I'm seeing results that don't make any sense and  
>>> maybe
>>> someone here with more experience of Lucene and the way memory is  
>>> allocated
>>> by the JVM may shed light on, what to me, are quite illogical  
>>> observations.
>>>
>>> As you may recall I had a stand-alone Lucene search and a  
>>> Hibernate Search
>>> version.  Looking in the HS code did not shed any light on the  
>>> issue. I took
>>> my stand-alone Lucene code and put it in a method and replaced the  
>>> search in
>>> the HS class (the constructor of QueryHits.java) with the call to  
>>> my method.
>>> Bear in mind this method is the same code as posted in my earlier  
>>> message -
>>> it sets up the Lucene search from scratch (i.e.: no data  
>>> structures created
>>> by HS were used). So, effectively I was calling my stand-alone  
>>> code after
>>> any setup done by Hibernate and any memory it may have allocated  
>>> (which
>>> turned out to be a few Mb).
>>>
>>> I get OOM!  Printing the free memory at this point shows bags of  
>>> memory
>>> left. Indeed, the same free memory (+/- a few Mb) as the stand- 
>>> alone Lucene
>>> version!
>>>
>>> I then instrumented the Lucene method where the OOM is occuring
>>> (FSDirectory.readInternal()).  I cannot understand the results I  
>>> am seeing.
>>> Below is a snippet of the output of each with the code around  
>>> FSDirectory
>>> line 598 as follows:
>>>
>>> ...
>>>       do {
>>>               long tot = Runtime.getRuntime().totalMemory();
>>>               long free =Runtime.getRuntime().freeMemory();
>>> System.out.println("LUCENE: offset="+offset+"  total="+total+"
>>> len-total="+(len-total)+"   free mem="+free+"  used ="+(tot-free));
>>>         int i = file.read(b, offset+total, len-total);
>>> ...
>>>
>>>
>>>
>>> The stand-alone version:
>>>
>>> ...
>>> LUCENE: offset=0  total=0  len-total=401   free mem=918576864  used
>>> =330080544
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864  used
>>> =330080544
>>> LUCENE: offset=0  total=0  len-total=883   free mem=918576864  used
>>> =330080544
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864  used
>>> =330080544
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864  used
>>> =330080544
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864  used
>>> =330080544
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864  used
>>> =330080544
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864  used
>>> =330080544
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864  used
>>> =330080544
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864  used
>>> =330080544
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864  used
>>> =330080544
>>> LUCENE: offset=0  total=0  len-total=209000000   free  
>>> mem=631122912  used
>>> =617534496
>>> LUCENE: offset=209000000  total=0  len-total=20900000   free  
>>> mem=631122912
>>> used =617534496
>>> LUCENE: offset=229900000  total=0  len-total=20900000   free  
>>> mem=631122912
>>> used =617534496
>>> LUCENE: offset=250800000  total=0  len-total=20900000   free  
>>> mem=631122912
>>> used =617534496
>>> ...
>>> completes successfully!
>>>
>>>
>>> The method called via Hibernate Search:
>>>
>>> ...
>>> LUCENE: offset=0  total=0  len-total=401   free mem=924185480  used
>>> =334892152
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480  used
>>> =334892152
>>> LUCENE: offset=0  total=0  len-total=883   free mem=924185480  used
>>> =334892152
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480  used
>>> =334892152
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480  used
>>> =334892152
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480  used
>>> =334892152
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480  used
>>> =334892152
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480  used
>>> =334892152
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480  used
>>> =334892152
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480  used
>>> =334892152
>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480  used
>>> =334892152
>>> LUCENE: offset=0  total=0  len-total=209000000   free  
>>> mem=636731528  used
>>> =622346104
>>> Exception in thread "main" java.lang.OutOfMemoryError
>>>       at java.io.RandomAccessFile.readBytes(Native Method)
>>>       at java.io.RandomAccessFile.read(Unknown Source)
>>>       at
>>> org.apache.lucene.store.FSDirectory 
>>> $FSIndexInput.readInternal(FSDirectory.java:599)
>>> ... fails with exception!
>>>
>>>
>>> Note that the HS version has slightly more free memory because I  
>>> ran it
>>> with -Xms1210M as opposed to -Xms1200M for the stand-alone to  
>>> offset any
>>> memory used by HS when it starts up.
>>>
>>> As you can see, these are identical for all practical purposes.   
>>> So what
>>> gives?
>>>
>>> I'm stumped, so any suggestions appreciated.
>>>
>>> Thanks.
>>>
>>>
>>> Quoting Michael McCandless <lucene@mikemccandless.com>:
>>>
>>>
>>>> Unfortunately, I'm not familiar with exactly what Hibernate  
>>>> search does
>>>> with the Lucene APIs.
>>>>
>>>> It must be doing something beyond what your standalone Lucene  
>>>> test case
>>>> does.
>>>>
>>>> Maybe ask this question on the Hibernate list?
>>>>
>>>> Mike
>>>>
>>>
>>>
>>>
>>
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>>
>> 111 West Evelyn Ave. Ste. 202
>> Sunnyvale, CA 94086
>> www.deepdyve.com
>> 408-773-0110 ext. 738
>> 858-414-0013 (m)
>> 408-773-0220 (fax)
>>
>
>
>


Mime
View raw message