lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: problems with large Lucene index (reason found)
Date Tue, 17 Mar 2009 12:11:02 GMT

I've opened:

     https://issues.apache.org/jira/browse/LUCENE-1566

for this.  Cameron, could you attach your patch to that issue?  Thanks.

Mike

lucene@digiatlas.org wrote:

>
> Yes, I overrode the read() method in  
> FSDirectory.FSIndexInput.Descriptor and forced it to read in 50Mb  
> chunks and do an arraycopy() into the array created by Lucene. It  
> now works with any heap size and doesn't get OOM.
>
> There may be other areas this could happen in the Lucene code  
> (although at present it seems to be working fine for me on our  
> largest, 17Gb, index but I haven't tried accessing data yet - only  
> getting the result size - so perhaps there are other calls to read()  
> with large buffer sizes).
>
> As this bug does not look like it will be fixed in the near future,  
> it might be an idea to put in place a fix in the Lucene code. I  
> think it would be safe to read in chunks of up to 100Mb without a  
> problem and I don't think it will affect performance to any great  
> degree.
>
> It's pleasing to see that Lucene can easily handle such huge  
> indexes, although this bug is obviously quite an impediment to doing  
> so.
>
> regards,
> Cameron Newham
>
> Quoting Michael McCandless <lucene@mikemccandless.com>:
>
>>
>> Gak, what a horrible bug!
>>
>> It seems widespread (JRE 1.5, 1.6, on Linux & Windows OSs).  And it's
>> been open for almost 2.5 years.  I just added a comment & voted for  
>> the
>> bug.
>>
>> Does it also occur on a 64 bit JRE?
>>
>> If you still allocate the full array, but read several smaller chunks
>> into it, do you still hit the bug?
>>
>> Mike
>>
>> lucene@digiatlas.org wrote:
>>
>>>
>>> I now know the cause of the problem. Increasing heap space  
>>> actually  breaks Lucene when reading large indexes.
>>>
>>> Details on why can be found here:
>>>
>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546
>>>
>>>
>>> Lucene is trying to read a huge block (about 2580Mb) at the point   
>>> of failure. While it allocates the required bytes in method   
>>> MultiSegmentReader.norms(), line 334, just fine, it is only when  
>>> it  attempts to use this array in a call to   
>>> RandomAccessFile.readBytes() that it gets OOM. This is caused by  
>>> a  bug in the native code for the Java IO.
>>>
>>> As observed in the bug report, large heap space actually causes  
>>> the  bug to appear. When I reduced my heap from 1200M to 1000M  
>>> the  exception was never generated and the code completed  
>>> correctly and  it reported the correct number of search hits in  
>>> the Hibernate  Search version of my program.
>>>
>>> This isn't good - I need as much memory as possible because I   
>>> intend to run my search as a web service.
>>>
>>> The work-around would be to read the file in small chunks, but I  
>>> am  not familiar with the Lucene code so I am unsure how that  
>>> would be  done in a global sense (i.e.: does it really need to  
>>> allocate a  buffer of that size in MultiSegmentReader?)
>>>
>>> The obvious solution (which I haven't tried yet) would be to  
>>> patch  the point in FSDirectory where the java IO read occurs -  
>>> looping  with a small buffer for the read and then concatenating  
>>> the result  back into Lucene's byte array.
>>>
>>> Thanks for the comments on this problem from people on this list.
>>>
>>>
>>>
>>> Quoting Ted Dunning <ted.dunning@gmail.com>:
>>>
>>>> try running with verbose gc.  That will give you more details  
>>>> about what is
>>>> happening.
>>>>
>>>> Even better, run with jconsole on the side so that you get really  
>>>> detailed
>>>> information on memory pools.
>>>>
>>>> On Thu, Mar 12, 2009 at 7:30 AM, <lucene@digiatlas.org> wrote:
>>>>
>>>>> Thanks Mike and Jokin for your comments on the memory problem. I  
>>>>> have
>>>>> submitted the query to the Hibernate Search list although I  
>>>>> haven't seen a
>>>>> response yet.
>>>>>
>>>>> In the meantime I did my own investigating in the code (I'd  
>>>>> rather have
>>>>> avoided this!). I'm seeing results that don't make any sense and  
>>>>> maybe
>>>>> someone here with more experience of Lucene and the way memory  
>>>>> is  allocated
>>>>> by the JVM may shed light on, what to me, are quite illogical   
>>>>> observations.
>>>>>
>>>>> As you may recall I had a stand-alone Lucene search and a  
>>>>> Hibernate Search
>>>>> version.  Looking in the HS code did not shed any light on the   
>>>>> issue. I took
>>>>> my stand-alone Lucene code and put it in a method and replaced   
>>>>> the search in
>>>>> the HS class (the constructor of QueryHits.java) with the call  
>>>>> to  my method.
>>>>> Bear in mind this method is the same code as posted in my  
>>>>> earlier  message -
>>>>> it sets up the Lucene search from scratch (i.e.: no data   
>>>>> structures created
>>>>> by HS were used). So, effectively I was calling my stand-alone  
>>>>> code after
>>>>> any setup done by Hibernate and any memory it may have allocated  
>>>>> (which
>>>>> turned out to be a few Mb).
>>>>>
>>>>> I get OOM!  Printing the free memory at this point shows bags of  
>>>>> memory
>>>>> left. Indeed, the same free memory (+/- a few Mb) as the  stand- 
>>>>> alone Lucene
>>>>> version!
>>>>>
>>>>> I then instrumented the Lucene method where the OOM is occuring
>>>>> (FSDirectory.readInternal()).  I cannot understand the results  
>>>>> I  am seeing.
>>>>> Below is a snippet of the output of each with the code around  
>>>>> FSDirectory
>>>>> line 598 as follows:
>>>>>
>>>>> ...
>>>>>     do {
>>>>>             long tot = Runtime.getRuntime().totalMemory();
>>>>>             long free =Runtime.getRuntime().freeMemory();
>>>>> System.out.println("LUCENE: offset="+offset+"  total="+total+"
>>>>> len-total="+(len-total)+"   free mem="+free+"  used ="+(tot- 
>>>>> free));
>>>>>       int i = file.read(b, offset+total, len-total);
>>>>> ...
>>>>>
>>>>>
>>>>>
>>>>> The stand-alone version:
>>>>>
>>>>> ...
>>>>> LUCENE: offset=0  total=0  len-total=401   free mem=918576864   
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864   
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0  total=0  len-total=883   free mem=918576864   
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864   
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864   
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864   
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864   
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864   
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864   
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864   
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=918576864   
>>>>> used
>>>>> =330080544
>>>>> LUCENE: offset=0  total=0  len-total=209000000   free  
>>>>> mem=631122912  used
>>>>> =617534496
>>>>> LUCENE: offset=209000000  total=0  len-total=20900000   free  
>>>>> mem=631122912
>>>>> used =617534496
>>>>> LUCENE: offset=229900000  total=0  len-total=20900000   free  
>>>>> mem=631122912
>>>>> used =617534496
>>>>> LUCENE: offset=250800000  total=0  len-total=20900000   free  
>>>>> mem=631122912
>>>>> used =617534496
>>>>> ...
>>>>> completes successfully!
>>>>>
>>>>>
>>>>> The method called via Hibernate Search:
>>>>>
>>>>> ...
>>>>> LUCENE: offset=0  total=0  len-total=401   free mem=924185480   
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480   
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0  total=0  len-total=883   free mem=924185480   
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480   
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480   
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480   
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480   
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480   
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480   
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480   
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0  total=0  len-total=1024   free mem=924185480   
>>>>> used
>>>>> =334892152
>>>>> LUCENE: offset=0  total=0  len-total=209000000   free  
>>>>> mem=636731528  used
>>>>> =622346104
>>>>> Exception in thread "main" java.lang.OutOfMemoryError
>>>>>     at java.io.RandomAccessFile.readBytes(Native Method)
>>>>>     at java.io.RandomAccessFile.read(Unknown Source)
>>>>>     at
>>>>> org.apache.lucene.store.FSDirectory 
>>>>> $FSIndexInput.readInternal(FSDirectory.java:599)
>>>>> ... fails with exception!
>>>>>
>>>>>
>>>>> Note that the HS version has slightly more free memory because I  
>>>>> ran it
>>>>> with -Xms1210M as opposed to -Xms1200M for the stand-alone to  
>>>>> offset any
>>>>> memory used by HS when it starts up.
>>>>>
>>>>> As you can see, these are identical for all practical purposes.   
>>>>> So what
>>>>> gives?
>>>>>
>>>>> I'm stumped, so any suggestions appreciated.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> Quoting Michael McCandless <lucene@mikemccandless.com>:
>>>>>
>>>>>
>>>>>> Unfortunately, I'm not familiar with exactly what Hibernate  
>>>>>> search does
>>>>>> with the Lucene APIs.
>>>>>>
>>>>>> It must be doing something beyond what your standalone Lucene  
>>>>>> test case
>>>>>> does.
>>>>>>
>>>>>> Maybe ask this question on the Hibernate list?
>>>>>>
>>>>>> Mike
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Ted Dunning, CTO
>>>> DeepDyve
>>>>
>>>> 111 West Evelyn Ave. Ste. 202
>>>> Sunnyvale, CA 94086
>>>> www.deepdyve.com
>>>> 408-773-0110 ext. 738
>>>> 858-414-0013 (m)
>>>> 408-773-0220 (fax)
>>>>
>>>
>>>
>>>
>
>
>


Mime
View raw message