Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 85307 invoked from network); 21 Apr 2009 22:41:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Apr 2009 22:41:27 -0000 Received: (qmail 50629 invoked by uid 500); 21 Apr 2009 22:41:25 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 50595 invoked by uid 500); 21 Apr 2009 22:41:25 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 50585 invoked by uid 99); 21 Apr 2009 22:41:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Apr 2009 22:41:25 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [87.233.177.133] (HELO core.aduna-software.com) (87.233.177.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Apr 2009 22:41:16 +0000 Received: from localhost (localhost [127.0.0.1]) by core.aduna-software.com (Postfix) with ESMTP id 0BB73AA037B for ; Wed, 22 Apr 2009 00:40:56 +0200 (CEST) X-Virus-Scanned: amavisd-new at X-Spam-Score: -2.501 X-Spam-Level: Received: from core.aduna-software.com ([127.0.0.1]) by localhost (core.aduna-software.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JFZy7gyEJTYs for ; Wed, 22 Apr 2009 00:40:51 +0200 (CEST) Received: from [192.168.1.101] (53578B7E.cable.casema.nl [83.87.139.126]) by core.aduna-software.com (Postfix) with ESMTP id 69B1EAA037A for ; Wed, 22 Apr 2009 00:40:51 +0200 (CEST) Message-ID: <49EE4B71.8030105@aduna-software.com> Date: Wed, 22 Apr 2009 00:40:49 +0200 From: Christiaan Fluit User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: semi-infinite loop during merging References: <49E4573E.7070208@aduna-software.com> <9ac0c6aa0904140306t4235c120n73629596899256bc@mail.gmail.com> <49E6FD9F.5080104@aduna-software.com> <9ac0c6aa0904160510l6cf559c6yfc6b0c1deadbf1de@mail.gmail.com> <49EE41F9.3030007@aduna-software.com> In-Reply-To: <49EE41F9.3030007@aduna-software.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Flag: NO X-Old-Spam-Status: No, score=-2.501 tagged_above=-10 required=4 tests=[AWL=-0.002, BAYES_00=-2.599, RDNS_DYNAMIC=0.1] Christiaan Fluit wrote: > It seems that it gets up to the point to commit, but the "IW: > commitMerge done" message is never reached. > > Furthermore, no exceptions are printed to the output, so > handleMergeException does not seem to have been invoked. > > Should I add more debug statements elsewhere? I may be on to something already. I just looked at the commitMerge code and was surprised to see that the commitMerge message that is almost at the beginning wasn't printed. Then I saw the "if (hitOOM) return false;" part that takes place before that. I think that this can only mean that an OOME was encountered at some point in time. Now, the fact is that in my indexing code I do a catch(Throwable) in several places. I do this particularly because JET handles OOMEs in a very, very nasty way. Often you will just get an error dialog and then it quits the entire application. Therefore, my client code catches, logs and swallows the OOME before the JET runtime can intercept it. *Usually*, the application can then recover gracefully and continue processing the rest of the information. Catching a OOME that results from the operation of a text extraction library is one thing (and a fact of life really), but perhaps there are also OOME's that occur during Lucene processing. I remember seeing those in the past with the original Java code, when very large Strings were being tokenized and I got an OOME with a deep Lucene stacktrace. I copied one such stacktrace that I have saved at the end of this mail. I see some caught and swallowed OOME's in my log file but unfortunately they are without a stacktrace - probably again a JET issue. I can run the normal Java build though to see if such OOMEs occur on this dataset. Now, I wonder: - when the IW is in auto-commit mode, can the failed processing of a Document due to an OOME have an impact on the processing of subsequent Documents or the merge/optimize operations? Can the index(writer) become corrupt and result in problems such as these? - even though the commitMerge returns false, it should probably not get into an infinite loop. Is this an internal Lucene problem or is there something I can/should do about it myself? - more generally, what is the recommended behavior when I get an OOME during Lucene processing, particularly IW.addDocument? Should the IW be able to recover by itself or is there some sort of rollback I need to perform? Again, note that my index is in auto-commit mode (though I had hoped to let go of that too, it's only for historic reasons). Regards, Chris -- java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.DocumentsWriter.getPostings(DocumentsWriter.java:3069) at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1696) at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1525) at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1412) at org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:1121) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2442) at org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2424) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1464) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1442) at info.aduna........... --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org