Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 76333 invoked from network); 13 Nov 2007 14:38:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Nov 2007 14:38:06 -0000 Received: (qmail 50481 invoked by uid 500); 13 Nov 2007 14:37:47 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 50442 invoked by uid 500); 13 Nov 2007 14:37:46 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 50430 invoked by uid 99); 13 Nov 2007 14:37:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Nov 2007 06:37:46 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [130.225.24.87] (HELO luna.statsbiblioteket.dk) (130.225.24.87) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Nov 2007 14:38:38 +0000 Received: from [130.225.25.7] (pc977.sb.statsbiblioteket.dk [130.225.25.7]) by luna.statsbiblioteket.dk (iPlanet Messaging Server 5.2 HotFix 1.16 (built May 14 2003)) with ESMTP id <0JRG00C9Z8MHV3@luna.statsbiblioteket.dk> for java-user@lucene.apache.org; Tue, 13 Nov 2007 15:37:29 +0100 (MET) Date: Tue, 13 Nov 2007 15:37:29 +0100 From: Lars Clausen Subject: OutOfMemoryError on small search in large, simple index To: java-user@lucene.apache.org Message-id: <1194964649.13896.1180.camel@pc977> Organization: Statsbiblioteket MIME-version: 1.0 X-Mailer: Evolution 2.12.0 Content-type: text/plain Content-transfer-encoding: 7BIT X-Virus-Checked: Checked by ClamAV on apache.org We've run into a blocking problem with our use of Lucene: we get OutOfMemoryError when performing a one-term search in our index. The search, if completed, should give only a few thousand hits, but from inspecting a heap dump it appears that many more documents in the index get stored in Lucene during the search. Our index consists of eight fields per document, fairly regularly sized, the total index size is 170GB, spread over about 400 million documents (425 bytes per document). The search is a simple TermQuery, the search term a trivial string, the code in question looks like this (cut together for conciseness): public static final String FIELD_URL = "url"; ... luceneSearcher = new IndexSearcher(indexDir.getAbsolutePath()); Query query = new TermQuery(new Term(DigestIndexer.FIELD_URL, uri)); try { Hits hits = luceneSearcher.search(query); Stack trace: Oct 11, 2007 4:02:19 PM org.slf4j.impl.JCLLoggerAdapter error SEVERE: EXCEPTION java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.SegmentReader.getNorms(SegmentReader.java:384) at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:393) at org.apache.lucene.search.TermQuery $TermWeight.scorer(TermQuery.java:68) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:129) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:99) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65) at org.apache.lucene.search.Hits.(Hits.java:44) at org.apache.lucene.search.Searcher.search(Searcher.java:44) at org.apache.lucene.search.Searcher.search(Searcher.java:36) at dk.netarkivet.common.distribute.arcrepository.ARCLookup.luceneLookup(ARCLookup.java:166) at dk.netarkivet.common.distribute.arcrepository.ARCLookup.lookup(ARCLookup.java:130) at dk.netarkivet.viewerproxy.ARCArchiveAccess.lookup(ARCArchiveAccess.java:126) at dk.netarkivet.viewerproxy.NotifyingURIResolver.lookup(NotifyingURIResolver.java:72) at dk.netarkivet.viewerproxy.CommandResolver.lookup(CommandResolver.java:80) at dk.netarkivet.viewerproxy.CommandResolver.lookup(CommandResolver.java:80) at dk.netarkivet.viewerproxy.CommandResolver.lookup(CommandResolver.java:80) at dk.netarkivet.viewerproxy.WebProxy.handle(WebProxy.java:129) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:457) at org.mortbay.jetty.HttpConnection $RequestHandler.headerComplete(HttpConnection.java:751) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:500) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:209) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:357) at org.mortbay.jetty.bio.SocketConnector $Connection.run(SocketConnector.java:217) at org.mortbay.thread.BoundedThreadPool $PoolThread.run(BoundedThreadPool.java:475) Can it be right that memory usage depends on size of the index rather than size of the result? Can something be done to reduce memory usage for such a simple but big scenario? -Lars --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org