Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 30060 invoked from network); 2 Jul 2004 18:50:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 2 Jul 2004 18:50:53 -0000 Received: (qmail 236 invoked by uid 500); 2 Jul 2004 18:50:55 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 169 invoked by uid 500); 2 Jul 2004 18:50:54 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 99992 invoked by uid 99); 2 Jul 2004 18:50:52 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [66.139.76.19] (HELO server1.hostmon.com) (66.139.76.19) by apache.org (qpsmtpd/0.27.1) with ESMTP; Fri, 02 Jul 2004 11:50:52 -0700 Received: (qmail 31507 invoked by uid 532); 2 Jul 2004 18:50:08 -0000 Received: from dave-lucene-user@tropo.com by server1.hostmon.com by uid 0 with qmail-scanner-1.16 (spamassassin: 2.63. Clear:. Processed in 0.283263 secs); 02 Jul 2004 18:50:08 -0000 Received: from unknown (HELO tropo.com) (127.0.0.1) by 0 with SMTP; 2 Jul 2004 18:50:08 -0000 Message-ID: <40E5AE79.3090702@tropo.com> Date: Fri, 02 Jul 2004 11:50:33 -0700 From: David Spencer User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7b) Gecko/20040316 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Users List Subject: Re: Running OutOfMemory while optimizing and searching References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N This in theory should not help, but anyway, just in case, the idea is to call gc() periodically to "force" gc - this is the code I use which tries to force it... public static long gc() { long bef = mem(); System.gc(); sleep( 100); System.runFinalization(); sleep( 100); System.gc(); long aft= mem(); return aft-bef; } Mark Florence wrote: > Thanks, Jim. I'm pretty sure I'm throwing OOM for real, > and not because I've run out of file handles. I can easily > recreate the latter condition, and it is always reported > accurately. I've also monitored the OOM as it occurs using > "top" and I can see memory usage climbing until it is > exhausted -- if you will excuse the pun! > > I'm not familiar with the new compound file format. Where > can I look to find more information? > > -- Mark > > -----Original Message----- > From: James Dunn [mailto:james_h_dunn@yahoo.com] > Sent: Friday, July 02, 2004 01:29 pm > To: Lucene Users List > Subject: Re: Running OutOfMemory while optimizing and searching > > > Ah yes, I don't think I made that clear enough. From > Mark's original post, I believe he mentioned that he > used seperate readers for each simultaneous query. > > His other issue was that he was getting an OOM during > an optimize, even when he set the JVM heap to 2GB. He > said his index was about 10.5GB spread over ~7000 > files on Linux. > > My guess is that OOM might actually be a "too many > open files" error. I have seen that type of error > being reported by the JVM as an OutOfMemory error on > Linux before. I had the same problem but once I > switched to the new Lucene compound file format, I > haven't had that problem since. > > Mark, have you tried switching to the compound file > format? > > Jim > > > > > --- Doug Cutting wrote: > >> > What do your queries look like? The memory >>required >> > for a query can be computed by the following >>equation: >> > >> > 1 Byte * Number of fields in your query * Number >>of >> > docs in your index >> > >> > So if your query searches on all 50 fields of >>your 3.5 >> > Million document index then each search would >>take >> > about 175MB. If your 3-4 searches run >>concurrently >> > then that's about 525MB to 700MB chewed up at >>once. >> >>That's not quite right. If you use the same >>IndexSearcher (or >>IndexReader) for all of the searches, then only >>175MB are used. The >>arrays in question (the norms) are read-only and can >>be shared by all >>searches. >> >>In general, the amount of memory required is: >> >>1 byte * Number of searchable fields in your index * >>Number of docs in >>your index >> >>plus >> >>1k bytes * number of terms in query >> >>plus >> >>1k bytes * number of phrase terms in query >> >>The latter are for i/o buffers. There are a few >>other things, but these >>are the major ones. >> >>Doug >> >> >> > > --------------------------------------------------------------------- > >>To unsubscribe, e-mail: >>lucene-user-unsubscribe@jakarta.apache.org >>For additional commands, e-mail: >>lucene-user-help@jakarta.apache.org >> >> > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org