Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 62846 invoked from network); 10 Nov 2004 11:43:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 10 Nov 2004 11:43:35 -0000 Received: (qmail 77706 invoked by uid 500); 10 Nov 2004 11:43:21 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 77679 invoked by uid 500); 10 Nov 2004 11:43:21 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 77660 invoked by uid 99); 10 Nov 2004 11:43:21 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [193.62.196.100] (HELO maui.ebi.ac.uk) (193.62.196.100) by apache.org (qpsmtpd/0.28) with ESMTP; Wed, 10 Nov 2004 03:43:18 -0800 Received: from SILK (silk.dhcp.ebi.ac.uk [172.22.69.6]) by maui.ebi.ac.uk (8.11.7+Sun/8.11.7) with SMTP id iAABhDF04275 for ; Wed, 10 Nov 2004 11:43:13 GMT From: "Rupinder Singh Mazara" To: "Lucene Users List" Subject: RE: Lucene1.4.1 + OutOf Memory Date: Wed, 10 Nov 2004 11:43:58 -0000 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441 Importance: Normal In-Reply-To: X-EBI-Information: This email is scanned using www.mailscanner.info. X-EBI: Found to be clean X-EBI-SpamCheck: not spam, SpamAssassin (score=0, required 5) X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N karthik i think the core problem in your case is the use of compound files, i would be best to switch it off or alternatively issue a optimize as soon as the indexing is over. i am copying the file contents between tags, the patch is to be applied on TermInfosReader.java, this was done to help out of memory exceptions while doing indexing Index: src/java/org/apache/lucene/index/TermInfosReader.java =================================================================== RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/TermInfosReader.ja va,v retrieving revision 1.9 diff -u -r1.9 TermInfosReader.java --- src/java/org/apache/lucene/index/TermInfosReader.java 6 Aug 2004 20:50:29 -0000 1.9 +++ src/java/org/apache/lucene/index/TermInfosReader.java 10 Sep 2004 17:46:47 -0000 @@ -45,6 +45,11 @@ readIndex(); } + protected final void finalize() { + // patch for pre-1.4.2 JVMs, whose ThreadLocals leak + enumerators.set(null); + } + public int getSkipInterval() { return origEnum.skipInterval; } however tomcat does react in strange ways to to-many open files, try to restrict the number of IndexReader or Searchable objects that you create while doing searches, I usually keep one object to handle all my user requests public static Searcher fetchCitationSearcher(HttpServletRequest request) throws Exception { Searcher rval = (Searcher) request.getSession().getServletContext().getAttribute( "luceneSearchable"); if (rval == null) { rval = new IndexSearcher( fetchCitationReader(request) ); request.getSession().getServletContext().setAttribute("luceneSearchable", rval); } return rval; } >-----Original Message----- >From: Karthik N S [mailto:karthik@controlnet.co.in] >Sent: 10 November 2004 11:41 >To: Lucene Users List >Subject: RE: Lucene1.4.1 + OutOf Memory > > >Hi > > Rupinder Singh Mazara > >Apologies............ > > > > Can u Past the code on to the Mail instead of Attachement... > > [ Cause I am not bale to get the Attachement on the Company's mail ] > > > Thx in advance >Karthik > > >-----Original Message----- >From: Rupinder Singh Mazara [mailto:rsmazara@ebi.ac.uk] >Sent: Wednesday, November 10, 2004 3:10 PM >To: Lucene Users List >Subject: RE: Lucene1.4.1 + OutOf Memory > > >hi all > > I had a similar problem with jdk1.4.1, Doug had sent me a patch which I am >attaching following is the mail from Doug > > It sounds like the ThreadLocal in TermInfosReader is not getting >correctly garbage collected when the TermInfosReader is collected. >Researching a bit, this was a bug in JVMs prior to 1.4.2, so my guess is >that you're running in an older JVM. Is that right? > >I've attached a patch which should fix this. Please tell me if it works >for you. > >Doug > >Daniel Taurat wrote: >> Okay, that (1.4rc3)worked fine, too! >> Got only 257 SegmentTermEnums for 1900 objects. >> >> Now I will go for the final test on the production server with the >> 1.4rc3 version and about 40.000 objects. >> >> Daniel >> >> Daniel Taurat schrieb: >> >>> Hi all, >>> here is some update for you: >>> I switched back to Lucene 1.3-final and now the number of the >>> SegmentTermEnum objects is controlled by gc again: >>> it goes up to about 1000 and then it is down again to 254 after >>> indexing my 1900 test-objects. >>> Stay tuned, I will try 1.4RC3 now, the last version before FieldCache >>> was introduced... >>> >>> Daniel >>> >>> >>> Rupinder Singh Mazara schrieb: >>> >>>> hi all >>>> I had a similar problem, i have database of documents with 24 >>>> fields, and a average content of 7K, with 16M+ records >>>> >>>> i had to split the jobs into slabs of 1M each and merging the >>>> resulting indexes, submissions to our job queue looked like >>>> >>>> java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22 >>>> >>>> and i still had outofmemory exception , the solution that i created >>>> was to after every 200K, documents create a temp directory, and merge >>>> them together, this was done to do the first production run, updates >>>> are now being handled incrementally >>>> >>>> >>>> >>>> Exception in thread "main" java.lang.OutOfMemoryError >>>> at >>>> >org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream >.java(Com >piled >>>> Code)) >>>> at >>>> org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined >>>> Compiled Code)) >>>> at >>>> >org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined >>>> Compiled Code)) >>>> at >>>> >org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled >>>> Code)) >>>> at >>>> >org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWri >ter.java( >Compiled >>>> Code)) >>>> at >>>> >org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter >.java(Com >piled >>>> Code)) >>>> at >>>> >org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMer >ger.java( >Compiled >>>> Code)) >>>> at >>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled >>>> Code)) >>>> at >>>> >org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled >>>> Code)) >>>> at >>>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366) >>>> at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code)) >>>> at lucene.Indexer.main(CDBIndexer.java:168) >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: Daniel Taurat [mailto:daniel.taurat@gaussvip.com] >>>>> Sent: 10 September 2004 14:42 >>>>> To: Lucene Users List >>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large >>>>> number >>>>> of documents >>>>> >>>>> >>>>> Hi Pete, >>>>> good hint, but we actually do have physical memory of 4Gb on the >>>>> system. But then: we also have experienced that the gc of ibm >>>>> jdk1.3.1 that we use is sometimes >>>>> behaving strangely with too large heap space anyway. (Limit seems to >>>>> be 1.2 Gb) >>>>> I can say that gc is not collecting these objects since I forced gc >>>>> runs when indexing every now and then (when parsing pdf-type >>>>> objects, that is): No effect. >>>>> >>>>> regards, >>>>> >>>>> Daniel >>>>> >>>>> >>>>> Pete Lewis wrote: >>>>> >>>>> >>>>> >>>>>> Hi all >>>>>> >>>>>> Reading the thread with interest, there is another way I've come >>>>> >>>>> >>>>> across out >>>>> >>>>> >>>>>> of memory errors when indexing large batches of documents. >>>>>> >>>>>> If you have your heap space settings too high, then you get >>>>> >>>>> >>>>> swapping (which >>>>> >>>>> >>>>>> impacts performance) plus you never reach the trigger for garbage >>>>>> collection, hence you don't garbage collect and hence you run out >>>>> >>>>> >>>>> of memory. >>>>> >>>>> >>>>>> Can you check whether or not your garbage collection is being >>>>>> triggered? >>>>>> >>>>>> Anomalously therefore if this is the case, by reducing the heap >>>>>> space you >>>>>> can improve performance get rid of the out of memory errors. >>>>>> >>>>>> Cheers >>>>>> Pete Lewis >>>>>> >>>>>> ----- Original Message ----- From: "Daniel Taurat" >>>>>> >>>>>> To: "Lucene Users List" >>>>>> Sent: Friday, September 10, 2004 1:10 PM >>>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large >>>>> >>>>> >>>>> number of >>>>> >>>>> >>>>>> documents >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Daniel Aber schrieb: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Thursday 09 September 2004 19:47, Daniel Taurat wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> I am facing an out of memory problem using Lucene 1.4.1. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Could you try with a recent CVS version? There has been a fix >>>>>>>> >>>>>>> >>>>>>> >>>>> about files >>>>> >>>>> >>>>>>>> not being deleted after 1.4.1. Not sure if that could cause the >>>>>>>> problems >>>>>>>> you're experiencing. >>>>>>>> >>>>>>>> Regards >>>>>>>> Daniel >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> Well, it seems not to be files, it looks more like those >>>>>>> SegmentTermEnum >>>>>>> objects accumulating in memory. >>>>>>> #I've seen some discussion on these objects in the >>>>>>> developer-newsgroup >>>>>>> that had taken place some time ago. >>>>>>> I am afraid this is some kind of runaway caching I have to >deal with. >>>>>>> Maybe not correctly addressed in this newsgroup, after all... >>>>>>> >>>>>>> Anyway: any idea if there is an API command to re-init caches? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Daniel >>>>>>> >>>>>>> >>>>>>> >>>>>>> >--------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org >>>> >>>> >>>> >>>> >>> >>> >> >> > >>-----Original Message----- >>From: Erik Hatcher [mailto:erik@ehatchersolutions.com] >>Sent: 10 November 2004 09:35 >>To: Lucene Users List >>Subject: Re: Lucene1.4.1 + OutOf Memory >> >> >>On Nov 10, 2004, at 1:55 AM, Karthik N S wrote: >>> >>> Hi >>> Guys >>> >>> Apologies.......... >> >>No need to apologize for asking questions. >> >>> History >>> >>> Ist type : 40000 subindexes + MultiSearcher + Search on Content >>> Field >> >>You've got 40,000 indexes aggregated under a MultiSearcher and you're >>wondering why you're running out of memory?! :O >> >>> Exception [ Too many Files Open ] >> >>Are you using the compound file format? >> >> Erik >> >> >>--------------------------------------------------------------------- >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org >> >> > > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org