lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: problems with large Lucene index (reason found)
Date Fri, 13 Mar 2009 15:16:16 GMT wrote:

> Yes, I overrode the read() method in  
> FSDirectory.FSIndexInput.Descriptor and forced it to read in 50Mb  
> chunks and do an arraycopy() into the array created by Lucene. It  
> now works with any heap size and doesn't get OOM.

You shouldn't need to do the extra arraycopy?  RandomAccessFile can  
read into a particular offset/len inside the array.  Does that not work?

> There may be other areas this could happen in the Lucene code  
> (although at present it seems to be working fine for me on our  
> largest, 17Gb, index but I haven't tried accessing data yet - only  
> getting the result size - so perhaps there are other calls to read()  
> with large buffer sizes).
> As this bug does not look like it will be fixed in the near future,  
> it might be an idea to put in place a fix in the Lucene code. I  
> think it would be safe to read in chunks of up to 100Mb without a  
> problem and I don't think it will affect performance to any great  
> degree.

I agree.  Can you open a Jira issue and post a patch?

> It's pleasing to see that Lucene can easily handle such huge  
> indexes, although this bug is obviously quite an impediment to doing  
> so.

Yes indeed.  This is one crazy bug.


View raw message