lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rupinder Singh Mazara" <rsmaz...@ebi.ac.uk>
Subject RE: Lucene1.4.1 + OutOf Memory
Date Wed, 10 Nov 2004 11:43:58 GMT
karthik

 i think the core problem in your case is the use of compound files, i would
be best to switch it off
 or alternatively issue a optimize as soon as the indexing is over.

  i am copying the file contents between <file> tags, the patch is to be
applied on TermInfosReader.java, this
 was done to help out of memory exceptions while doing indexing
  <file>
Index: src/java/org/apache/lucene/index/TermInfosReader.java
===================================================================
RCS file:
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/TermInfosReader.ja
va,v
retrieving revision 1.9
diff -u -r1.9 TermInfosReader.java
--- src/java/org/apache/lucene/index/TermInfosReader.java	6 Aug 2004
20:50:29 -0000	1.9
+++ src/java/org/apache/lucene/index/TermInfosReader.java	10 Sep 2004
17:46:47 -0000
@@ -45,6 +45,11 @@
     readIndex();
   }

+  protected final void finalize() {
+    // patch for pre-1.4.2 JVMs, whose ThreadLocals leak
+    enumerators.set(null);
+  }
+
   public int getSkipInterval() {
     return origEnum.skipInterval;
   }
</file>



 however tomcat does react in strange ways to to-many open files,
 try to restrict the number of IndexReader or Searchable objects
  that you create while  doing searches,
I  usually keep one object to handle all my user requests

 public static Searcher fetchCitationSearcher(HttpServletRequest request)
throws Exception {
        Searcher rval = (Searcher)
request.getSession().getServletContext().getAttribute(
                "luceneSearchable");
        if (rval == null) {
          rval = new IndexSearcher( fetchCitationReader(request) );

request.getSession().getServletContext().setAttribute("luceneSearchable",
rval);
        }
        return rval;
    }




>-----Original Message-----
>From: Karthik N S [mailto:karthik@controlnet.co.in]
>Sent: 10 November 2004 11:41
>To: Lucene Users List
>Subject: RE: Lucene1.4.1 + OutOf Memory
>
>
>Hi
>
>  Rupinder Singh Mazara
>
>Apologies............
>
>
>
>  Can u Past the code on to the Mail instead of Attachement...
>
>  [ Cause I am not bale to get the Attachement  on the Company's mail ]
>
>
> Thx in advance
>Karthik
>
>
>-----Original Message-----
>From: Rupinder Singh Mazara [mailto:rsmazara@ebi.ac.uk]
>Sent: Wednesday, November 10, 2004 3:10 PM
>To: Lucene Users List
>Subject: RE: Lucene1.4.1 + OutOf Memory
>
>
>hi all
>
> I had a similar problem with jdk1.4.1, Doug had sent me a patch which I am
>attaching following is the mail from Doug
>
> It sounds like the ThreadLocal in TermInfosReader is not getting
>correctly garbage collected when the TermInfosReader is collected.
>Researching a bit, this was a bug in JVMs prior to 1.4.2, so my guess is
>that you're running in an older JVM.  Is that right?
>
>I've attached a patch which should fix this.  Please tell me if it works
>for you.
>
>Doug
>
>Daniel Taurat wrote:
>> Okay, that (1.4rc3)worked fine, too!
>> Got only 257 SegmentTermEnums for 1900 objects.
>>
>> Now I will go for the final test on the production server with the
>> 1.4rc3 version  and about 40.000 objects.
>>
>> Daniel
>>
>> Daniel Taurat schrieb:
>>
>>> Hi all,
>>> here is some update for you:
>>> I switched back to Lucene 1.3-final and now the  number of the
>>> SegmentTermEnum objects is controlled by gc again:
>>> it goes up to about 1000 and then it is down again to 254 after
>>> indexing my 1900 test-objects.
>>> Stay tuned, I will try 1.4RC3 now, the last version before FieldCache
>>> was introduced...
>>>
>>> Daniel
>>>
>>>
>>> Rupinder Singh Mazara schrieb:
>>>
>>>> hi all
>>>>  I had a similar problem, i have  database of documents with 24
>>>> fields, and a average content of 7K, with  16M+ records
>>>>
>>>>  i had to split the jobs into slabs of 1M each and merging the
>>>> resulting indexes, submissions to our job queue looked like
>>>>
>>>>  java -Xms100M -Xcompactexplicitgc -cp $CLASSPATH lucene.Indexer 22
>>>>
>>>> and i still had outofmemory exception , the solution that i created
>>>> was to after every 200K, documents create a temp directory, and merge
>>>> them together, this was done to do the first production run, updates
>>>> are now being handled incrementally
>>>>
>>>>
>>>>
>>>> Exception in thread "main" java.lang.OutOfMemoryError
>>>> at
>>>>
>org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream
>.java(Com
>piled
>>>> Code))
>>>>     at
>>>> org.apache.lucene.store.OutputStream.flush(OutputStream.java(Inlined
>>>> Compiled Code))
>>>>     at
>>>>
>org.apache.lucene.store.OutputStream.writeByte(OutputStream.java(Inlined
>>>> Compiled Code))
>>>>     at
>>>>
>org.apache.lucene.store.OutputStream.writeBytes(OutputStream.java(Compiled
>>>> Code))
>>>>     at
>>>>
>org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWri
>ter.java(
>Compiled
>>>> Code))
>>>>     at
>>>>
>org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter
>.java(Com
>piled
>>>> Code))
>>>>     at
>>>>
>org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMer
>ger.java(
>Compiled
>>>> Code))
>>>>     at
>>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java(Compiled
>>>> Code))
>>>>     at
>>>>
>org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java(Compiled
>>>> Code))
>>>>     at
>>>> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366)
>>>>     at lucene.Indexer.doIndex(CDBIndexer.java(Compiled Code))
>>>>     at lucene.Indexer.main(CDBIndexer.java:168)
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Daniel Taurat [mailto:daniel.taurat@gaussvip.com]
>>>>> Sent: 10 September 2004 14:42
>>>>> To: Lucene Users List
>>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large
>>>>> number
>>>>> of documents
>>>>>
>>>>>
>>>>> Hi Pete,
>>>>> good hint, but we actually do have physical memory of  4Gb on the
>>>>> system. But then: we also have experienced that the gc of ibm
>>>>> jdk1.3.1 that we use is sometimes
>>>>> behaving strangely with too large heap space anyway. (Limit seems to
>>>>> be 1.2 Gb)
>>>>> I can say that gc is not collecting these objects since I  forced gc
>>>>> runs when indexing every now and then (when parsing pdf-type
>>>>> objects, that is): No effect.
>>>>>
>>>>> regards,
>>>>>
>>>>> Daniel
>>>>>
>>>>>
>>>>> Pete Lewis wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hi all
>>>>>>
>>>>>> Reading the thread with interest, there is another way I've come
>>>>>
>>>>>
>>>>> across out
>>>>>
>>>>>
>>>>>> of memory errors when indexing large batches of documents.
>>>>>>
>>>>>> If you have your heap space settings too high, then you get
>>>>>
>>>>>
>>>>> swapping (which
>>>>>
>>>>>
>>>>>> impacts performance) plus you never reach the trigger for garbage
>>>>>> collection, hence you don't garbage collect and hence you run out
>>>>>
>>>>>
>>>>> of memory.
>>>>>
>>>>>
>>>>>> Can you check whether or not your garbage collection is being
>>>>>> triggered?
>>>>>>
>>>>>> Anomalously therefore if this is the case, by reducing the heap
>>>>>> space you
>>>>>> can improve performance get rid of the out of memory errors.
>>>>>>
>>>>>> Cheers
>>>>>> Pete Lewis
>>>>>>
>>>>>> ----- Original Message ----- From: "Daniel Taurat"
>>>>>> <daniel.taurat@gaussvip.com>
>>>>>> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>>>>>> Sent: Friday, September 10, 2004 1:10 PM
>>>>>> Subject: Re: Out of memory in lucene 1.4.1 when re-indexing large
>>>>>
>>>>>
>>>>> number of
>>>>>
>>>>>
>>>>>> documents
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Daniel Aber schrieb:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Thursday 09 September 2004 19:47, Daniel Taurat wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> I am facing an out of memory problem using  Lucene 1.4.1.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Could you try with a recent CVS version? There has been a
fix
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>> about files
>>>>>
>>>>>
>>>>>>>> not being deleted after 1.4.1. Not sure if that could cause
the
>>>>>>>> problems
>>>>>>>> you're experiencing.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Daniel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Well, it seems not to be files, it looks more like those
>>>>>>> SegmentTermEnum
>>>>>>> objects accumulating in memory.
>>>>>>> #I've seen some discussion on these objects in the
>>>>>>> developer-newsgroup
>>>>>>> that had taken place some time ago.
>>>>>>> I am afraid this is some kind of runaway caching I have to
>deal with.
>>>>>>> Maybe not  correctly addressed in this newsgroup, after all...
>>>>>>>
>>>>>>> Anyway: any idea if there is an API command to re-init caches?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Daniel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>>-----Original Message-----
>>From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
>>Sent: 10 November 2004 09:35
>>To: Lucene Users List
>>Subject: Re: Lucene1.4.1 + OutOf Memory
>>
>>
>>On Nov 10, 2004, at 1:55 AM, Karthik N S wrote:
>>>
>>> Hi
>>> Guys
>>>
>>> Apologies..........
>>
>>No need to apologize for asking questions.
>>
>>> History
>>>
>>> Ist type :  40000  subindexes   +  MultiSearcher  + Search on Content
>>> Field
>>
>>You've got 40,000 indexes aggregated under a MultiSearcher and you're
>>wondering why you're running out of memory?!  :O
>>
>>> Exception  [ Too many Files Open ]
>>
>>Are you using the compound file format?
>>
>>	Erik
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message