Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 98223 invoked from network); 10 Sep 2004 14:42:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 10 Sep 2004 14:42:43 -0000 Received: (qmail 37779 invoked by uid 500); 10 Sep 2004 14:42:30 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 37675 invoked by uid 500); 10 Sep 2004 14:42:29 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 37638 invoked by uid 99); 10 Sep 2004 14:42:29 -0000 X-ASF-Spam-Status: No, hits=0.1 required=10.0 tests=DNS_FROM_RFC_ABUSE X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [193.103.133.77] (HELO hh-mail-2.gauss.de) (193.103.133.77) by apache.org (qpsmtpd/0.28) with ESMTP; Fri, 10 Sep 2004 07:42:27 -0700 Received: from [172.20.76.5] ([172.20.76.5]) by hh-mail-2.gauss.de (Netscape Messaging Server 4.15) with ESMTP id I3TY1W00.N6Z for ; Fri, 10 Sep 2004 16:39:32 +0200 Message-ID: <4141BCA4.8030605@gaussvip.com> Date: Fri, 10 Sep 2004 16:39:32 +0200 From: "Daniel Taurat" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de-AT; rv:1.7.2) Gecko/20040803 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Users List Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large number of documents References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi all, here is some update for you: I switched back to Lucene 1.3-final and now the number of the SegmentTermEnum objects is controlled by gc again: it goes up to about 1000 and then it is down again to 254 after indexing my 1900 test-objects. Stay tuned, I will try 1.4RC3 now, the last version before FieldCache was introduced... Daniel Rupinder Singh Mazara schrieb: >hi all > > I had a similar problem, i have database of documents with 24 fields, and a average content of 7K, with 16M+ records > > i had to split the jobs into slabs of 1M each and merging the resulting indexes, submissions to our job queue looked like > > java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22 > > and i still had outofmemory exception , the solution that i created was to after every 200K, documents create a temp directory, and merge them together, this was done to do the first production run, updates are now being handled incrementally > > > >Exception in thread "main" java.lang.OutOfMemoryError >at org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java(Compiled Code)) > at org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined Compiled Code)) > at org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined Compiled Code)) > at org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled Code)) > at org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java(Compiled Code)) > at org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java(Compiled Code)) > at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java(Compiled Code)) > at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled Code)) > at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled Code)) > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366) > at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code)) > at lucene.Indexer.main(CDBIndexer.java:168) > > > >>-----Original Message----- >>From: Daniel Taurat [mailto:daniel.taurat@gaussvip.com] >>Sent: 10 September 2004 14:42 >>To: Lucene Users List >>Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large number >>of documents >> >> >>Hi Pete, >>good hint, but we actually do have physical memory of 4Gb on the >>system. But then: we also have experienced that the gc of ibm jdk1.3.1 >>that we use is sometimes >>behaving strangely with too large heap space anyway. (Limit seems to be >>1.2 Gb) >>I can say that gc is not collecting these objects since I forced gc >>runs when indexing every now and then (when parsing pdf-type objects, >>that is): No effect. >> >>regards, >> >>Daniel >> >> >>Pete Lewis wrote: >> >> >> >>>Hi all >>> >>>Reading the thread with interest, there is another way I've come >>> >>> >>across out >> >> >>>of memory errors when indexing large batches of documents. >>> >>>If you have your heap space settings too high, then you get >>> >>> >>swapping (which >> >> >>>impacts performance) plus you never reach the trigger for garbage >>>collection, hence you don't garbage collect and hence you run out >>> >>> >>of memory. >> >> >>>Can you check whether or not your garbage collection is being triggered? >>> >>>Anomalously therefore if this is the case, by reducing the heap space you >>>can improve performance get rid of the out of memory errors. >>> >>>Cheers >>>Pete Lewis >>> >>>----- Original Message ----- >>>From: "Daniel Taurat" >>>To: "Lucene Users List" >>>Sent: Friday, September 10, 2004 1:10 PM >>>Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large >>> >>> >>number of >> >> >>>documents >>> >>> >>> >>> >>> >>> >>>>Daniel Aber schrieb: >>>> >>>> >>>> >>>> >>>> >>>>>On Thursday 09 September 2004 19:47, Daniel Taurat wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>I am facing an out of memory problem using Lucene 1.4.1. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>Could you try with a recent CVS version? There has been a fix >>>>> >>>>> >>about files >> >> >>>>>not being deleted after 1.4.1. Not sure if that could cause the problems >>>>>you're experiencing. >>>>> >>>>>Regards >>>>>Daniel >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>Well, it seems not to be files, it looks more like those SegmentTermEnum >>>>objects accumulating in memory. >>>>#I've seen some discussion on these objects in the developer-newsgroup >>>>that had taken place some time ago. >>>>I am afraid this is some kind of runaway caching I have to deal with. >>>>Maybe not correctly addressed in this newsgroup, after all... >>>> >>>>Anyway: any idea if there is an API command to re-init caches? >>>> >>>>Thanks, >>>> >>>>Daniel >>>> >>>> >>>> >>>>--------------------------------------------------------------------- >>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org >>>> >>>> >>>> >>>> >>>> >>>--------------------------------------------------------------------- >>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org >>> >>> >>> >>> >>> >> >>--------------------------------------------------------------------- >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org >> >> >> >> > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > > -- Mit freundlichen Grüßen Dr. Daniel Taurat Senior Consultant -- VIP ENTERPRISE 8 | THE POWER OF CONTENT AT WORK _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gauss Interprise AG Phone: +49-40-3250-1508 Weidestr. 120 a Mobile: +49-173-2418472 D- 22083 Hamburg Fax: +49-40-3250-191508 Germany E-Mail: daniel.taurat@gaussvip.com Web: http://www.gaussvip.com _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org