lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <reng...@ix.netcom.com>
Subject Re: [jira] Resolved: (LUCENE-1053) OutOfMemoryError on search in large, simple index
Date Tue, 13 Nov 2007 13:06:31 GMT
The user list is the appropriate spot, but this brings up a  
discussion point.

The "norms" will require 1 byte per document, so you will need at  
least 512 M for the heap.

Start the java process with -Xmx512m and see what happens.

Depending on what you are doing you might be able to "omit the  
norms", but this really doesn't save any memory, BUT...

Maybe Lucene should be changed to not create the 'fake norms' array,  
and instead if if the norms() returned null, then don't dereference  
the norm value, but use

DefaultSimilarity.encodeNorm(1.0f)

in 'real time'.  (This is what our branch does).  The memory savings  
is huge for a large index, and many Lucene applications do not need  
the norms (thus the 'omit norms' option).

This would require changes to all of the calls to norms() - about 25  
instances, and some scorer code (since it derefernces the norms  
directly).

The simplest solution would be to change

byte[] norms()

to

int norm(int doc)
{
	final int defaultnorm = DefaultSimilarity.encodeNorm(1.0f);

if(norms==null)
	return defaultnorm;
else
	return norms[doc]

}

The trivial method will be inlined, so the performance hit would be  
negligible. Then all users of the norms array would be changed to norm 
(doc).

On Nov 13, 2007, at 6:39 AM, Grant Ingersoll (JIRA) wrote:

>
>      [ https://issues.apache.org/jira/browse/LUCENE-1053? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Grant Ingersoll resolved LUCENE-1053.
> -------------------------------------
>
>        Resolution: Invalid
>     Lucene Fields:   (was: [New])
>
> Hi Lars,
>
> Generally we recommend you open discussion of issues you are having  
> with your applications use of Lucene by asking questions on the  
> java-user mailing list.  What you are reporting doesn't necessarily  
> sound like a bug in Lucene, so let's discuss it on java-user first  
> and hopefully we can get you some answers there first.
>
> Start by posting what you have here, plus add in what your heap  
> settings are, etc.  Lucene doesn't scale infinitely (nor does any  
> search application or, for that matter, program), when you reach a  
> certain index size, you will have to start doing things like  
> distributed search whereby you split your index across 2 or more  
> machines.  You _MAY_ have hit those limits and may need to  
> distribute your search.
>
> Cheers,
> Grant
>
>> OutOfMemoryError on search in large, simple index
>> -------------------------------------------------
>>
>>                 Key: LUCENE-1053
>>                 URL: https://issues.apache.org/jira/browse/ 
>> LUCENE-1053
>>             Project: Lucene - Java
>>          Issue Type: Bug
>>          Components: Search
>>    Affects Versions: 2.0.0
>>         Environment: Red Hat Enterprise Linux ES release 3 (Taroon  
>> Update 9)
>> Linux sb-test-acs-001 2.4.21-47.0.1.ELsmp #1 SMP Fri Oct 13  
>> 17:56:20 EDT 2006 i686 i686 i386 GNU/Linux
>> 2 GB RAM
>> Java version 1.5.0_13
>>            Reporter: Lars Clausen
>>
>> We get OutOfMemoryError when performing a one-term search in our  
>> index.  The search, if completed, should give only a few thousand  
>> hits, but from inspecting a heap dump it appears that many more  
>> documents in the index get stored in Lucene during the search. Our  
>> index consists of eight fields per document, fairly regularly  
>> sized, the total index size is 170GB, spread over about 400  
>> million documents (425 bytes per document).  The search is a  
>> simple TermQuery, the search term a trivial string, the code in  
>> question looks like this (cut together for conciseness):
>> 	public static final String FIELD_URL = "url";
>> ...
>>         luceneSearcher = new IndexSearcher(indexDir.getAbsolutePath 
>> ());
>>         Query query = new TermQuery(new Term 
>> (DigestIndexer.FIELD_URL, uri));
>>         try {
>>             Hits hits = luceneSearcher.search(query);
>> Stack trace:
>> Oct 11, 2007 4:02:19 PM org.slf4j.impl.JCLLoggerAdapter error
>> SEVERE: EXCEPTION
>> java.lang.OutOfMemoryError: Java heap space
>>         at org.apache.lucene.index.SegmentReader.getNorms 
>> (SegmentReader.java:384)
>>         at org.apache.lucene.index.SegmentReader.norms 
>> (SegmentReader.java:393)
>>         at org.apache.lucene.search.TermQuery$TermWeight.scorer 
>> (TermQuery.java:68)
>>         at org.apache.lucene.search.IndexSearcher.search 
>> (IndexSearcher.java:129)
>>         at org.apache.lucene.search.IndexSearcher.search 
>> (IndexSearcher.java:99)
>>         at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:65)
>>         at org.apache.lucene.search.Hits.(Hits.java:44)
>>         at org.apache.lucene.search.Searcher.search(Searcher.java:44)
>>         at org.apache.lucene.search.Searcher.search(Searcher.java:36)
>>         at  
>> dk.netarkivet.common.distribute.arcrepository.ARCLookup.luceneLookup( 
>> ARCLookup.java:166)
>>         at  
>> dk.netarkivet.common.distribute.arcrepository.ARCLookup.lookup 
>> (ARCLookup.java:130)
>>         at dk.netarkivet.viewerproxy.ARCArchiveAccess.lookup 
>> (ARCArchiveAccess.java:126)
>>         at dk.netarkivet.viewerproxy.NotifyingURIResolver.lookup 
>> (NotifyingURIResolver.java:72)
>>         at dk.netarkivet.viewerproxy.CommandResolver.lookup 
>> (CommandResolver.java:80)
>>         at dk.netarkivet.viewerproxy.CommandResolver.lookup 
>> (CommandResolver.java:80)
>>         at dk.netarkivet.viewerproxy.CommandResolver.lookup 
>> (CommandResolver.java:80)
>>         at dk.netarkivet.viewerproxy.WebProxy.handle(WebProxy.java: 
>> 129)
>>         at org.mortbay.jetty.handler.HandlerWrapper.handle 
>> (HandlerWrapper.java:139)
>>         at org.mortbay.jetty.Server.handle(Server.java:285)
>>         at org.mortbay.jetty.HttpConnection.handleRequest 
>> (HttpConnection.java:457)
>>         at org.mortbay.jetty.HttpConnection 
>> $RequestHandler.headerComplete(HttpConnection.java:751)
>>         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java: 
>> 500)
>>         at org.mortbay.jetty.HttpParser.parseAvailable 
>> (HttpParser.java:209)
>>         at org.mortbay.jetty.HttpConnection.handle 
>> (HttpConnection.java:357)
>>         at org.mortbay.jetty.bio.SocketConnector$Connection.run 
>> (SocketConnector.java:217)
>>         at org.mortbay.thread.BoundedThreadPool$PoolThread.run 
>> (BoundedThreadPool.java:475)
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message