Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 61042 invoked from network); 3 May 2006 18:35:33 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 3 May 2006 18:35:33 -0000 Received: (qmail 47721 invoked by uid 500); 3 May 2006 18:35:30 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 47523 invoked by uid 500); 3 May 2006 18:35:30 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 47512 invoked by uid 99); 3 May 2006 18:35:30 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 May 2006 11:35:30 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 May 2006 11:35:29 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 0F66271428F for ; Wed, 3 May 2006 18:34:56 +0000 (GMT) Message-ID: <24295361.1146681295757.JavaMail.jira@brutus> Date: Wed, 3 May 2006 18:34:55 +0000 (GMT+00:00) From: "robert engels (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-436) [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception In-Reply-To: <773685739.1127317287745.JavaMail.jira@ajax.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/LUCENE-436?page=comments#action_12377607 ] robert engels commented on LUCENE-436: -------------------------------------- fwiw, we have done EXTENSIVE memory profiling of the low-level Lucene (and JVM) routines. It is my opinion that there are no memory leaks in either. In every case that we "found" a memory leak, it always ended up tied back to something we were doing wrong on our client code. > [PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception > ---------------------------------------------------------------- > > Key: LUCENE-436 > URL: http://issues.apache.org/jira/browse/LUCENE-436 > Project: Lucene - Java > Type: Improvement > Components: Index > Versions: 1.4 > Environment: Solaris JVM 1.4.1 > Linux JVM 1.4.2/1.5.0 > Windows not tested > Reporter: kieran > Attachments: Lucene-436-TestCase.tar.gz > > We've been experiencing terrible memory problems on our production search server, running lucene (1.4.3). > Our live app regularly opens new indexes and, in doing so, releases old IndexReaders for garbage collection. > But...there appears to be a memory leak in org.apache.lucene.index.TermInfosReader.java. > Under certain conditions (possibly related to JVM version, although I've personally observed it under both linux JVM 1.4.2_06, and 1.5.0_03, and SUNOS JVM 1.4.1) the ThreadLocal member variable, "enumerators" doesn't get garbage-collected when the TermInfosReader object is gc-ed. > Looking at the code in TermInfosReader.java, there's no reason why it _shouldn't_ be gc-ed, so I can only presume (and I've seen this suggested elsewhere) that there could be a bug in the garbage collector of some JVMs. > I've seen this problem briefly discussed; in particular at the following URL: > http://java2.5341.com/msg/85821.html > The patch that Doug recommended, which is included in lucene-1.4.3 doesn't work in our particular circumstances. Doug's patch only clears the ThreadLocal variable for the thread running the finalizer (my knowledge of java breaks down here - I'm not sure which thread actually runs the finalizer). In our situation, the TermInfosReader is (potentially) used by more than one thread, meaning that Doug's patch _doesn't_ allow the affected JVMs to correctly collect garbage. > So...I've devised a simple patch which, from my observations on linux JVMs 1.4.2_06, and 1.5.0_03, fixes this problem. > Kieran > PS Thanks to daniel naber for pointing me to jira/lucene > @@ -19,6 +19,7 @@ > import java.io.IOException; > import org.apache.lucene.store.Directory; > +import java.util.Hashtable; > /** This stores a monotonically increasing set of pairs in a > * Directory. Pairs are accessed either by Term or by ordinal position the > @@ -29,7 +30,7 @@ > private String segment; > private FieldInfos fieldInfos; > - private ThreadLocal enumerators = new ThreadLocal(); > + private final Hashtable enumeratorsByThread = new Hashtable(); > private SegmentTermEnum origEnum; > private long size; > @@ -60,10 +61,10 @@ > } > private SegmentTermEnum getEnum() { > - SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get(); > + SegmentTermEnum termEnum = (SegmentTermEnum)enumeratorsByThread.get(Thread.currentThread()); > if (termEnum == null) { > termEnum = terms(); > - enumerators.set(termEnum); > + enumeratorsByThread.put(Thread.currentThread(), termEnum); > } > return termEnum; > } > @@ -195,5 +196,15 @@ > public SegmentTermEnum terms(Term term) throws IOException { > get(term); > return (SegmentTermEnum)getEnum().clone(); > + } > + > + /* some jvms might have trouble gc-ing enumeratorsByThread */ > + protected void finalize() throws Throwable { > + try { > + // make sure gc can clear up. > + enumeratorsByThread.clear(); > + } finally { > + super.finalize(); > + } > } > } > TermInfosReader.java, full source: > ====================================== > package org.apache.lucene.index; > /** > * Copyright 2004 The Apache Software Foundation > * > * Licensed under the Apache License, Version 2.0 (the "License"); > * you may not use this file except in compliance with the License. > * You may obtain a copy of the License at > * > * http://www.apache.org/licenses/LICENSE-2.0 > * > * Unless required by applicable law or agreed to in writing, software > * distributed under the License is distributed on an "AS IS" BASIS, > * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > * See the License for the specific language governing permissions and > * limitations under the License. > */ > import java.io.IOException; > import org.apache.lucene.store.Directory; > import java.util.Hashtable; > /** This stores a monotonically increasing set of pairs in a > * Directory. Pairs are accessed either by Term or by ordinal position the > * set. */ > final class TermInfosReader { > private Directory directory; > private String segment; > private FieldInfos fieldInfos; > private final Hashtable enumeratorsByThread = new Hashtable(); > private SegmentTermEnum origEnum; > private long size; > TermInfosReader(Directory dir, String seg, FieldInfos fis) > throws IOException { > directory = dir; > segment = seg; > fieldInfos = fis; > origEnum = new SegmentTermEnum(directory.openFile(segment + ".tis"), > fieldInfos, false); > size = origEnum.size; > readIndex(); > } > public int getSkipInterval() { > return origEnum.skipInterval; > } > final void close() throws IOException { > if (origEnum != null) > origEnum.close(); > } > /** Returns the number of term/value pairs in the set. */ > final long size() { > return size; > } > private SegmentTermEnum getEnum() { > SegmentTermEnum termEnum = (SegmentTermEnum)enumeratorsByThread.get(Thread.currentThread()); > if (termEnum == null) { > termEnum = terms(); > enumeratorsByThread.put(Thread.currentThread(), termEnum); > } > return termEnum; > } > Term[] indexTerms = null; > TermInfo[] indexInfos; > long[] indexPointers; > private final void readIndex() throws IOException { > SegmentTermEnum indexEnum = > new SegmentTermEnum(directory.openFile(segment + ".tii"), > fieldInfos, true); > try { > int indexSize = (int)indexEnum.size; > indexTerms = new Term[indexSize]; > indexInfos = new TermInfo[indexSize]; > indexPointers = new long[indexSize]; > for (int i = 0; indexEnum.next(); i++) { > indexTerms[i] = indexEnum.term(); > indexInfos[i] = indexEnum.termInfo(); > indexPointers[i] = indexEnum.indexPointer; > } > } finally { > indexEnum.close(); > } > } > /** Returns the offset of the greatest index entry which is less than or equal to term.*/ > private final int getIndexOffset(Term term) throws IOException { > int lo = 0; // binary search indexTerms[] > int hi = indexTerms.length - 1; > while (hi >= lo) { > int mid = (lo + hi) >> 1; > int delta = term.compareTo(indexTerms[mid]); > if (delta < 0) > hi = mid - 1; > else if (delta > 0) > lo = mid + 1; > else > return mid; > } > return hi; > } > private final void seekEnum(int indexOffset) throws IOException { > getEnum().seek(indexPointers[indexOffset], > (indexOffset * getEnum().indexInterval) - 1, > indexTerms[indexOffset], indexInfos[indexOffset]); > } > /** Returns the TermInfo for a Term in the set, or null. */ > TermInfo get(Term term) throws IOException { > if (size == 0) return null; > // optimize sequential access: first try scanning cached enum w/o seeking > SegmentTermEnum enumerator = getEnum(); > if (enumerator.term() != null // term is at or past current > && ((enumerator.prev != null && term.compareTo(enumerator.prev) > 0) > || term.compareTo(enumerator.term()) >= 0)) { > int enumOffset = (int)(enumerator.position/enumerator.indexInterval)+1; > if (indexTerms.length == enumOffset // but before end of block > || term.compareTo(indexTerms[enumOffset]) < 0) > return scanEnum(term); // no need to seek > } > // random-access: must seek > seekEnum(getIndexOffset(term)); > return scanEnum(term); > } > /** Scans within block for matching term. */ > private final TermInfo scanEnum(Term term) throws IOException { > SegmentTermEnum enumerator = getEnum(); > while (term.compareTo(enumerator.term()) > 0 && enumerator.next()) {} > if (enumerator.term() != null && term.compareTo(enumerator.term()) == 0) > return enumerator.termInfo(); > else > return null; > } > /** Returns the nth term in the set. */ > final Term get(int position) throws IOException { > if (size == 0) return null; > SegmentTermEnum enumerator = getEnum(); > if (enumerator != null && enumerator.term() != null && > position >= enumerator.position && > position < (enumerator.position + enumerator.indexInterval)) > return scanEnum(position); // can avoid seek > seekEnum(position / enumerator.indexInterval); // must seek > return scanEnum(position); > } > private final Term scanEnum(int position) throws IOException { > SegmentTermEnum enumerator = getEnum(); > while(enumerator.position < position) > if (!enumerator.next()) > return null; > return enumerator.term(); > } > /** Returns the position of a Term in the set or -1. */ > final long getPosition(Term term) throws IOException { > if (size == 0) return -1; > int indexOffset = getIndexOffset(term); > seekEnum(indexOffset); > SegmentTermEnum enumerator = getEnum(); > while(term.compareTo(enumerator.term()) > 0 && enumerator.next()) {} > if (term.compareTo(enumerator.term()) == 0) > return enumerator.position; > else > return -1; > } > /** Returns an enumeration of all the Terms and TermInfos in the set. */ > public SegmentTermEnum terms() { > return (SegmentTermEnum)origEnum.clone(); > } > /** Returns an enumeration of terms starting at or after the named term. */ > public SegmentTermEnum terms(Term term) throws IOException { > get(term); > return (SegmentTermEnum)getEnum().clone(); > } > /* some jvms might have trouble gc-ing enumeratorsByThread */ > protected void finalize() throws Throwable { > try { > // make sure gc can clear up. > enumeratorsByThread.clear(); > } finally { > super.finalize(); > } > } > } -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org