Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 8060 invoked from network); 10 Nov 2004 09:39:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 10 Nov 2004 09:39:22 -0000 Received: (qmail 2632 invoked by uid 500); 10 Nov 2004 09:39:10 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 2598 invoked by uid 500); 10 Nov 2004 09:39:09 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 2585 invoked by uid 99); 10 Nov 2004 09:39:09 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [193.62.196.100] (HELO maui.ebi.ac.uk) (193.62.196.100) by apache.org (qpsmtpd/0.28) with ESMTP; Wed, 10 Nov 2004 01:39:09 -0800 Received: from SILK (silk.dhcp.ebi.ac.uk [172.22.69.6]) by maui.ebi.ac.uk (8.11.7+Sun/8.11.7) with SMTP id iAA9clF05615 for ; Wed, 10 Nov 2004 09:38:48 GMT From: "Rupinder Singh Mazara" To: "Lucene Users List" Subject: RE: Lucene1.4.1 + OutOf Memory Date: Wed, 10 Nov 2004 09:39:30 -0000 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0002_01C4C709.2DA52FD0" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0) X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441 Importance: Normal In-Reply-To: X-EBI-Information: This email is scanned using www.mailscanner.info. X-EBI: Found to be clean X-EBI-SpamCheck: not spam, SpamAssassin (score=0, required 5) X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N ------=_NextPart_000_0002_01C4C709.2DA52FD0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit hi all I had a similar problem with jdk1.4.1, Doug had sent me a patch which I am attaching following is the mail from Doug It sounds like the ThreadLocal in TermInfosReader is not getting correctly garbage collected when the TermInfosReader is collected. Researching a bit, this was a bug in JVMs prior to 1.4.2, so my guess is that you're running in an older JVM. Is that right? I've attached a patch which should fix this. Please tell me if it works for you. Doug Daniel Taurat wrote: > Okay, that (1.4rc3)worked fine, too! > Got only 257 SegmentTermEnums for 1900 objects. > > Now I will go for the final test on the production server with the > 1.4rc3 version and about 40.000 objects. > > Daniel > > Daniel Taurat schrieb: > >> Hi all, >> here is some update for you: >> I switched back to Lucene 1.3-final and now the number of the >> SegmentTermEnum objects is controlled by gc again: >> it goes up to about 1000 and then it is down again to 254 after >> indexing my 1900 test-objects. >> Stay tuned, I will try 1.4RC3 now, the last version before FieldCache >> was introduced... >> >> Daniel >> >> >> Rupinder Singh Mazara schrieb: >> >>> hi all >>> I had a similar problem, i have database of documents with 24 >>> fields, and a average content of 7K, with 16M+ records >>> >>> i had to split the jobs into slabs of 1M each and merging the >>> resulting indexes, submissions to our job queue looked like >>> >>> java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22 >>> >>> and i still had outofmemory exception , the solution that i created >>> was to after every 200K, documents create a temp directory, and merge >>> them together, this was done to do the first production run, updates >>> are now being handled incrementally >>> >>> >>> >>> Exception in thread "main" java.lang.OutOfMemoryError >>> at >>> org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java(Com piled >>> Code)) >>> at >>> org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined >>> Compiled Code)) >>> at >>> org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined >>> Compiled Code)) >>> at >>> org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled >>> Code)) >>> at >>> org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java( Compiled >>> Code)) >>> at >>> org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java(Com piled >>> Code)) >>> at >>> org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java( Compiled >>> Code)) >>> at >>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled >>> Code)) >>> at >>> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled >>> Code)) >>> at >>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366) >>> at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code)) >>> at lucene.Indexer.main(CDBIndexer.java:168) >>> >>> >>> >>>> -----Original Message----- >>>> From: Daniel Taurat [mailto:daniel.taurat@gaussvip.com] >>>> Sent: 10 September 2004 14:42 >>>> To: Lucene Users List >>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large >>>> number >>>> of documents >>>> >>>> >>>> Hi Pete, >>>> good hint, but we actually do have physical memory of 4Gb on the >>>> system. But then: we also have experienced that the gc of ibm >>>> jdk1.3.1 that we use is sometimes >>>> behaving strangely with too large heap space anyway. (Limit seems to >>>> be 1.2 Gb) >>>> I can say that gc is not collecting these objects since I forced gc >>>> runs when indexing every now and then (when parsing pdf-type >>>> objects, that is): No effect. >>>> >>>> regards, >>>> >>>> Daniel >>>> >>>> >>>> Pete Lewis wrote: >>>> >>>> >>>> >>>>> Hi all >>>>> >>>>> Reading the thread with interest, there is another way I've come >>>> >>>> >>>> across out >>>> >>>> >>>>> of memory errors when indexing large batches of documents. >>>>> >>>>> If you have your heap space settings too high, then you get >>>> >>>> >>>> swapping (which >>>> >>>> >>>>> impacts performance) plus you never reach the trigger for garbage >>>>> collection, hence you don't garbage collect and hence you run out >>>> >>>> >>>> of memory. >>>> >>>> >>>>> Can you check whether or not your garbage collection is being >>>>> triggered? >>>>> >>>>> Anomalously therefore if this is the case, by reducing the heap >>>>> space you >>>>> can improve performance get rid of the out of memory errors. >>>>> >>>>> Cheers >>>>> Pete Lewis >>>>> >>>>> ----- Original Message ----- From: "Daniel Taurat" >>>>> >>>>> To: "Lucene Users List" >>>>> Sent: Friday, September 10, 2004 1:10 PM >>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large >>>> >>>> >>>> number of >>>> >>>> >>>>> documents >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> Daniel Aber schrieb: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On Thursday 09 September 2004 19:47, Daniel Taurat wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I am facing an out of memory problem using Lucene 1.4.1. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> Could you try with a recent CVS version? There has been a fix >>>>>>> >>>>>> >>>>>> >>>> about files >>>> >>>> >>>>>>> not being deleted after 1.4.1. Not sure if that could cause the >>>>>>> problems >>>>>>> you're experiencing. >>>>>>> >>>>>>> Regards >>>>>>> Daniel >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> Well, it seems not to be files, it looks more like those >>>>>> SegmentTermEnum >>>>>> objects accumulating in memory. >>>>>> #I've seen some discussion on these objects in the >>>>>> developer-newsgroup >>>>>> that had taken place some time ago. >>>>>> I am afraid this is some kind of runaway caching I have to deal with. >>>>>> Maybe not correctly addressed in this newsgroup, after all... >>>>>> >>>>>> Anyway: any idea if there is an API command to re-init caches? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Daniel >>>>>> >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org >>>> >>>> >>>> >>> >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org >>> >>> >>> >>> >> >> > > >-----Original Message----- >From: Erik Hatcher [mailto:erik@ehatchersolutions.com] >Sent: 10 November 2004 09:35 >To: Lucene Users List >Subject: Re: Lucene1.4.1 + OutOf Memory > > >On Nov 10, 2004, at 1:55 AM, Karthik N S wrote: >> >> Hi >> Guys >> >> Apologies.......... > >No need to apologize for asking questions. > >> History >> >> Ist type : 40000 subindexes + MultiSearcher + Search on Content >> Field > >You've got 40,000 indexes aggregated under a MultiSearcher and you're >wondering why you're running out of memory?! :O > >> Exception [ Too many Files Open ] > >Are you using the compound file format? > > Erik > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > ------=_NextPart_000_0002_01C4C709.2DA52FD0 Content-Type: text/plain; charset=us-ascii --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org ------=_NextPart_000_0002_01C4C709.2DA52FD0--