Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 58269 invoked from network); 10 Nov 2004 11:32:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 10 Nov 2004 11:32:15 -0000 Received: (qmail 62519 invoked by uid 500); 10 Nov 2004 11:32:04 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 62490 invoked by uid 500); 10 Nov 2004 11:32:03 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 62476 invoked by uid 99); 10 Nov 2004 11:32:03 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [203.199.26.74] (HELO daakghar.controlnet.co.in) (203.199.26.74) by apache.org (qpsmtpd/0.28) with SMTP; Wed, 10 Nov 2004 03:32:03 -0800 Received: from karthik ([192.168.4.1]) by dakiya.controlnet.co.in (Netscape Messaging Server 4.15) with ESMTP id I6YOQ800.NIY for ; Wed, 10 Nov 2004 17:16:56 +0530 From: "Karthik N S" To: "Lucene Users List" Subject: RE: Lucene1.4.1 + OutOf Memory Date: Wed, 10 Nov 2004 17:11:09 +0530 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) In-Reply-To: Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi Rupinder Singh Mazara Apologies............ Can u Past the code on to the Mail instead of Attachement... [ Cause I am not bale to get the Attachement on the Company's mail ] Thx in advance Karthik -----Original Message----- From: Rupinder Singh Mazara [mailto:rsmazara@ebi.ac.uk] Sent: Wednesday, November 10, 2004 3:10 PM To: Lucene Users List Subject: RE: Lucene1.4.1 + OutOf Memory hi all I had a similar problem with jdk1.4.1, Doug had sent me a patch which I am attaching following is the mail from Doug It sounds like the ThreadLocal in TermInfosReader is not getting correctly garbage collected when the TermInfosReader is collected. Researching a bit, this was a bug in JVMs prior to 1.4.2, so my guess is that you're running in an older JVM. Is that right? I've attached a patch which should fix this. Please tell me if it works for you. Doug Daniel Taurat wrote: > Okay, that (1.4rc3)worked fine, too! > Got only 257 SegmentTermEnums for 1900 objects. > > Now I will go for the final test on the production server with the > 1.4rc3 version and about 40.000 objects. > > Daniel > > Daniel Taurat schrieb: > >> Hi all, >> here is some update for you: >> I switched back to Lucene 1.3-final and now the number of the >> SegmentTermEnum objects is controlled by gc again: >> it goes up to about 1000 and then it is down again to 254 after >> indexing my 1900 test-objects. >> Stay tuned, I will try 1.4RC3 now, the last version before FieldCache >> was introduced... >> >> Daniel >> >> >> Rupinder Singh Mazara schrieb: >> >>> hi all >>> I had a similar problem, i have database of documents with 24 >>> fields, and a average content of 7K, with 16M+ records >>> >>> i had to split the jobs into slabs of 1M each and merging the >>> resulting indexes, submissions to our job queue looked like >>> >>> java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22 >>> >>> and i still had outofmemory exception , the solution that i created >>> was to after every 200K, documents create a temp directory, and merge >>> them together, this was done to do the first production run, updates >>> are now being handled incrementally >>> >>> >>> >>> Exception in thread "main" java.lang.OutOfMemoryError >>> at >>> org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java(Com piled >>> Code)) >>> at >>> org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined >>> Compiled Code)) >>> at >>> org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined >>> Compiled Code)) >>> at >>> org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled >>> Code)) >>> at >>> org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java( Compiled >>> Code)) >>> at >>> org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java(Com piled >>> Code)) >>> at >>> org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java( Compiled >>> Code)) >>> at >>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled >>> Code)) >>> at >>> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled >>> Code)) >>> at >>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366) >>> at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code)) >>> at lucene.Indexer.main(CDBIndexer.java:168) >>> >>> >>> >>>> -----Original Message----- >>>> From: Daniel Taurat [mailto:daniel.taurat@gaussvip.com] >>>> Sent: 10 September 2004 14:42 >>>> To: Lucene Users List >>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large >>>> number >>>> of documents >>>> >>>> >>>> Hi Pete, >>>> good hint, but we actually do have physical memory of 4Gb on the >>>> system. But then: we also have experienced that the gc of ibm >>>> jdk1.3.1 that we use is sometimes >>>> behaving strangely with too large heap space anyway. (Limit seems to >>>> be 1.2 Gb) >>>> I can say that gc is not collecting these objects since I forced gc >>>> runs when indexing every now and then (when parsing pdf-type >>>> objects, that is): No effect. >>>> >>>> regards, >>>> >>>> Daniel >>>> >>>> >>>> Pete Lewis wrote: >>>> >>>> >>>> >>>>> Hi all >>>>> >>>>> Reading the thread with interest, there is another way I've come >>>> >>>> >>>> across out >>>> >>>> >>>>> of memory errors when indexing large batches of documents. >>>>> >>>>> If you have your heap space settings too high, then you get >>>> >>>> >>>> swapping (which >>>> >>>> >>>>> impacts performance) plus you never reach the trigger for garbage >>>>> collection, hence you don't garbage collect and hence you run out >>>> >>>> >>>> of memory. >>>> >>>> >>>>> Can you check whether or not your garbage collection is being >>>>> triggered? >>>>> >>>>> Anomalously therefore if this is the case, by reducing the heap >>>>> space you >>>>> can improve performance get rid of the out of memory errors. >>>>> >>>>> Cheers >>>>> Pete Lewis >>>>> >>>>> ----- Original Message ----- From: "Daniel Taurat" >>>>> >>>>> To: "Lucene Users List" >>>>> Sent: Friday, September 10, 2004 1:10 PM >>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large >>>> >>>> >>>> number of >>>> >>>> >>>>> documents >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> Daniel Aber schrieb: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On Thursday 09 September 2004 19:47, Daniel Taurat wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I am facing an out of memory problem using Lucene 1.4.1. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> Could you try with a recent CVS version? There has been a fix >>>>>>> >>>>>> >>>>>> >>>> about files >>>> >>>> >>>>>>> not being deleted after 1.4.1. Not sure if that could cause the >>>>>>> problems >>>>>>> you're experiencing. >>>>>>> >>>>>>> Regards >>>>>>> Daniel >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> Well, it seems not to be files, it looks more like those >>>>>> SegmentTermEnum >>>>>> objects accumulating in memory. >>>>>> #I've seen some discussion on these objects in the >>>>>> developer-newsgroup >>>>>> that had taken place some time ago. >>>>>> I am afraid this is some kind of runaway caching I have to deal with. >>>>>> Maybe not correctly addressed in this newsgroup, after all... >>>>>> >>>>>> Anyway: any idea if there is an API command to re-init caches? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Daniel >>>>>> >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org >>>> >>>> >>>> >>> >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org >>> >>> >>> >>> >> >> > > >-----Original Message----- >From: Erik Hatcher [mailto:erik@ehatchersolutions.com] >Sent: 10 November 2004 09:35 >To: Lucene Users List >Subject: Re: Lucene1.4.1 + OutOf Memory > > >On Nov 10, 2004, at 1:55 AM, Karthik N S wrote: >> >> Hi >> Guys >> >> Apologies.......... > >No need to apologize for asking questions. > >> History >> >> Ist type : 40000 subindexes + MultiSearcher + Search on Content >> Field > >You've got 40,000 indexes aggregated under a MultiSearcher and you're >wondering why you're running out of memory?! :O > >> Exception [ Too many Files Open ] > >Are you using the compound file format? > > Erik > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org