Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 51258 invoked from network); 13 Mar 2009 09:56:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Mar 2009 09:56:18 -0000 Received: (qmail 99011 invoked by uid 500); 13 Mar 2009 09:56:16 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 98985 invoked by uid 500); 13 Mar 2009 09:56:16 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 98974 invoked by uid 99); 13 Mar 2009 09:56:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Mar 2009 02:56:16 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.221.108] (HELO mail-qy0-f108.google.com) (209.85.221.108) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Mar 2009 09:56:07 +0000 Received: by qyk6 with SMTP id 6so2753962qyk.29 for ; Fri, 13 Mar 2009 02:55:45 -0700 (PDT) Received: by 10.224.14.206 with SMTP id h14mr1578318qaa.150.1236938145777; Fri, 13 Mar 2009 02:55:45 -0700 (PDT) Received: from ?10.17.4.4? (pool-173-48-164-75.bstnma.fios.verizon.net [173.48.164.75]) by mx.google.com with ESMTPS id 9sm4503467yws.25.2009.03.13.02.55.44 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 13 Mar 2009 02:55:45 -0700 (PDT) Message-Id: <09F959F1-A759-44CD-90FB-F4D6C4822512@mikemccandless.com> From: Michael McCandless To: general@lucene.apache.org In-Reply-To: <20090313093836.d8dyfjus08gkcg48@webmail.digiatlas.org> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v930.3) Subject: Re: problems with large Lucene index (reason found) Date: Fri, 13 Mar 2009 05:55:43 -0400 References: <20090305091642.tn2pa2480gokgwg8@webmail.digiatlas.org> <20090309142553.owuism4vsw0wwwg8@webmail.digiatlas.org> <20090311132002.fxsszjida8488o8c@webmail.digiatlas.org> <4786E70C-D416-4D21-BFFC-90667AD4002B@mikemccandless.com> <20090312143009.9ute77bb400kkcok@webmail.digiatlas.org> <20090313093836.d8dyfjus08gkcg48@webmail.digiatlas.org> X-Mailer: Apple Mail (2.930.3) X-Virus-Checked: Checked by ClamAV on apache.org Gak, what a horrible bug! It seems widespread (JRE 1.5, 1.6, on Linux & Windows OSs). And it's been open for almost 2.5 years. I just added a comment & voted for the bug. Does it also occur on a 64 bit JRE? If you still allocate the full array, but read several smaller chunks into it, do you still hit the bug? Mike lucene@digiatlas.org wrote: > > I now know the cause of the problem. Increasing heap space actually > breaks Lucene when reading large indexes. > > Details on why can be found here: > > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546 > > > Lucene is trying to read a huge block (about 2580Mb) at the point of > failure. While it allocates the required bytes in method > MultiSegmentReader.norms(), line 334, just fine, it is only when it > attempts to use this array in a call to RandomAccessFile.readBytes() > that it gets OOM. This is caused by a bug in the native code for the > Java IO. > > As observed in the bug report, large heap space actually causes the > bug to appear. When I reduced my heap from 1200M to 1000M the > exception was never generated and the code completed correctly and > it reported the correct number of search hits in the Hibernate > Search version of my program. > > This isn't good - I need as much memory as possible because I intend > to run my search as a web service. > > The work-around would be to read the file in small chunks, but I am > not familiar with the Lucene code so I am unsure how that would be > done in a global sense (i.e.: does it really need to allocate a > buffer of that size in MultiSegmentReader?) > > The obvious solution (which I haven't tried yet) would be to patch > the point in FSDirectory where the java IO read occurs - looping > with a small buffer for the read and then concatenating the result > back into Lucene's byte array. > > Thanks for the comments on this problem from people on this list. > > > > Quoting Ted Dunning : > >> try running with verbose gc. That will give you more details about >> what is >> happening. >> >> Even better, run with jconsole on the side so that you get really >> detailed >> information on memory pools. >> >> On Thu, Mar 12, 2009 at 7:30 AM, wrote: >> >>> Thanks Mike and Jokin for your comments on the memory problem. I >>> have >>> submitted the query to the Hibernate Search list although I >>> haven't seen a >>> response yet. >>> >>> In the meantime I did my own investigating in the code (I'd rather >>> have >>> avoided this!). I'm seeing results that don't make any sense and >>> maybe >>> someone here with more experience of Lucene and the way memory is >>> allocated >>> by the JVM may shed light on, what to me, are quite illogical >>> observations. >>> >>> As you may recall I had a stand-alone Lucene search and a >>> Hibernate Search >>> version. Looking in the HS code did not shed any light on the >>> issue. I took >>> my stand-alone Lucene code and put it in a method and replaced the >>> search in >>> the HS class (the constructor of QueryHits.java) with the call to >>> my method. >>> Bear in mind this method is the same code as posted in my earlier >>> message - >>> it sets up the Lucene search from scratch (i.e.: no data >>> structures created >>> by HS were used). So, effectively I was calling my stand-alone >>> code after >>> any setup done by Hibernate and any memory it may have allocated >>> (which >>> turned out to be a few Mb). >>> >>> I get OOM! Printing the free memory at this point shows bags of >>> memory >>> left. Indeed, the same free memory (+/- a few Mb) as the stand- >>> alone Lucene >>> version! >>> >>> I then instrumented the Lucene method where the OOM is occuring >>> (FSDirectory.readInternal()). I cannot understand the results I >>> am seeing. >>> Below is a snippet of the output of each with the code around >>> FSDirectory >>> line 598 as follows: >>> >>> ... >>> do { >>> long tot = Runtime.getRuntime().totalMemory(); >>> long free =Runtime.getRuntime().freeMemory(); >>> System.out.println("LUCENE: offset="+offset+" total="+total+" >>> len-total="+(len-total)+" free mem="+free+" used ="+(tot-free)); >>> int i = file.read(b, offset+total, len-total); >>> ... >>> >>> >>> >>> The stand-alone version: >>> >>> ... >>> LUCENE: offset=0 total=0 len-total=401 free mem=918576864 used >>> =330080544 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used >>> =330080544 >>> LUCENE: offset=0 total=0 len-total=883 free mem=918576864 used >>> =330080544 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used >>> =330080544 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used >>> =330080544 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used >>> =330080544 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used >>> =330080544 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used >>> =330080544 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used >>> =330080544 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used >>> =330080544 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=918576864 used >>> =330080544 >>> LUCENE: offset=0 total=0 len-total=209000000 free >>> mem=631122912 used >>> =617534496 >>> LUCENE: offset=209000000 total=0 len-total=20900000 free >>> mem=631122912 >>> used =617534496 >>> LUCENE: offset=229900000 total=0 len-total=20900000 free >>> mem=631122912 >>> used =617534496 >>> LUCENE: offset=250800000 total=0 len-total=20900000 free >>> mem=631122912 >>> used =617534496 >>> ... >>> completes successfully! >>> >>> >>> The method called via Hibernate Search: >>> >>> ... >>> LUCENE: offset=0 total=0 len-total=401 free mem=924185480 used >>> =334892152 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used >>> =334892152 >>> LUCENE: offset=0 total=0 len-total=883 free mem=924185480 used >>> =334892152 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used >>> =334892152 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used >>> =334892152 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used >>> =334892152 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used >>> =334892152 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used >>> =334892152 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used >>> =334892152 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used >>> =334892152 >>> LUCENE: offset=0 total=0 len-total=1024 free mem=924185480 used >>> =334892152 >>> LUCENE: offset=0 total=0 len-total=209000000 free >>> mem=636731528 used >>> =622346104 >>> Exception in thread "main" java.lang.OutOfMemoryError >>> at java.io.RandomAccessFile.readBytes(Native Method) >>> at java.io.RandomAccessFile.read(Unknown Source) >>> at >>> org.apache.lucene.store.FSDirectory >>> $FSIndexInput.readInternal(FSDirectory.java:599) >>> ... fails with exception! >>> >>> >>> Note that the HS version has slightly more free memory because I >>> ran it >>> with -Xms1210M as opposed to -Xms1200M for the stand-alone to >>> offset any >>> memory used by HS when it starts up. >>> >>> As you can see, these are identical for all practical purposes. >>> So what >>> gives? >>> >>> I'm stumped, so any suggestions appreciated. >>> >>> Thanks. >>> >>> >>> Quoting Michael McCandless : >>> >>> >>>> Unfortunately, I'm not familiar with exactly what Hibernate >>>> search does >>>> with the Lucene APIs. >>>> >>>> It must be doing something beyond what your standalone Lucene >>>> test case >>>> does. >>>> >>>> Maybe ask this question on the Hibernate list? >>>> >>>> Mike >>>> >>> >>> >>> >> >> >> -- >> Ted Dunning, CTO >> DeepDyve >> >> 111 West Evelyn Ave. Ste. 202 >> Sunnyvale, CA 94086 >> www.deepdyve.com >> 408-773-0110 ext. 738 >> 858-414-0013 (m) >> 408-773-0220 (fax) >> > > >