lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars Clausen ...@statsbiblioteket.dk>
Subject OutOfMemoryError on small search in large, simple index
Date Tue, 13 Nov 2007 14:37:29 GMT
We've run into a blocking problem with our use of Lucene: we get
OutOfMemoryError when performing a one-term search in our index. The
search, if completed, should give only a few thousand hits, but from
inspecting a heap dump it appears that many more documents in the index
get stored in Lucene during the search. Our index consists of eight
fields per document, fairly regularly sized, the total index size is
170GB, spread over about 400 million documents (425 bytes per document).
The search is a simple TermQuery, the search term a trivial string, the
code in question looks like this (cut together for conciseness): 

public static final String FIELD_URL = "url";
...
luceneSearcher = new IndexSearcher(indexDir.getAbsolutePath());
Query query = new TermQuery(new Term(DigestIndexer.FIELD_URL, uri));
try {
Hits hits = luceneSearcher.search(query);

Stack trace:
Oct 11, 2007 4:02:19 PM org.slf4j.impl.JCLLoggerAdapter error
SEVERE: EXCEPTION 
java.lang.OutOfMemoryError: Java heap space
at
org.apache.lucene.index.SegmentReader.getNorms(SegmentReader.java:384)
at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:393)
at org.apache.lucene.search.TermQuery
$TermWeight.scorer(TermQuery.java:68)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:129)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:99)
at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
at org.apache.lucene.search.Hits.(Hits.java:44)
at org.apache.lucene.search.Searcher.search(Searcher.java:44)
at org.apache.lucene.search.Searcher.search(Searcher.java:36)
at
dk.netarkivet.common.distribute.arcrepository.ARCLookup.luceneLookup(ARCLookup.java:166)
at
dk.netarkivet.common.distribute.arcrepository.ARCLookup.lookup(ARCLookup.java:130)
at
dk.netarkivet.viewerproxy.ARCArchiveAccess.lookup(ARCArchiveAccess.java:126)
at
dk.netarkivet.viewerproxy.NotifyingURIResolver.lookup(NotifyingURIResolver.java:72)
at
dk.netarkivet.viewerproxy.CommandResolver.lookup(CommandResolver.java:80)
at
dk.netarkivet.viewerproxy.CommandResolver.lookup(CommandResolver.java:80)
at
dk.netarkivet.viewerproxy.CommandResolver.lookup(CommandResolver.java:80)
at dk.netarkivet.viewerproxy.WebProxy.handle(WebProxy.java:129)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:457)
at org.mortbay.jetty.HttpConnection
$RequestHandler.headerComplete(HttpConnection.java:751)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:500)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:209)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:357)
at org.mortbay.jetty.bio.SocketConnector
$Connection.run(SocketConnector.java:217)
at org.mortbay.thread.BoundedThreadPool
$PoolThread.run(BoundedThreadPool.java:475)

Can it be right that memory usage depends on size of the index rather
than size of the result?  Can something be done to reduce memory usage
for such a simple but big scenario?

-Lars


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message