lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave <dla...@gmail.com>
Subject Trying to understand SOLR memory requirements
Date Tue, 17 Jan 2012 01:31:21 GMT
I'm trying to figure out what my memory needs are for a rather large
dataset. I'm trying to build an auto-complete system for every
city/state/country in the world. I've got a geographic database, and have
setup the DIH to pull the proper data in. There are 2,784,937 documents
which I've formatted into JSON-like output, so there's a bit of data
associated with each one. Here is an example record:

Brooklyn, New York, United States?{ |id|: |2620829|,
|timezone|:|America/New_York|,|type|: |3|, |country|: { |id| : |229| },
|region|: { |id| : |3608| }, |city|: { |id|: |2616971|, |plainname|:
|Brooklyn|, |name|: |Brooklyn, New York, United States| }, |hint|:
|2300664|, |label|: |Brooklyn, New York, United States|, |value|:
|Brooklyn, New York, United States|, |title|: |Brooklyn, New York, United
States| }

I've got the spellchecker / suggester module setup, and I can confirm that
everything works properly with a smaller dataset (i.e. just a couple of
countries worth of cities/states). However I'm running into a big problem
when I try to index the entire dataset. The dataimport?command=full-import
works and the system comes to an idle state. It generates the following
data/index/ directory (I'm including it in case it gives any indication on
memory requirements):

-rw-rw---- 1 root   root   2.2G Jan 17 00:13 _2w.fdt
-rw-rw---- 1 root   root    22M Jan 17 00:13 _2w.fdx
-rw-rw---- 1 root   root    131 Jan 17 00:13 _2w.fnm
-rw-rw---- 1 root   root   134M Jan 17 00:13 _2w.frq
-rw-rw---- 1 root   root    16M Jan 17 00:13 _2w.nrm
-rw-rw---- 1 root   root   130M Jan 17 00:13 _2w.prx
-rw-rw---- 1 root   root   9.2M Jan 17 00:13 _2w.tii
-rw-rw---- 1 root   root   1.1G Jan 17 00:13 _2w.tis
-rw-rw---- 1 root   root     20 Jan 17 00:13 segments.gen
-rw-rw---- 1 root   root    291 Jan 17 00:13 segments_2

Next I try to run the suggest?spellcheck.build=true command, and I get the
following error:

Jan 16, 2012 4:01:47 PM org.apache.solr.spelling.suggest.Suggester build
INFO: build()
Jan 16, 2012 4:03:27 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
 at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:215)
 at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:184)
 at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:203)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:172)
 at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:509)
at org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:719)
 at org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:309)
at
org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.isFrequent(HighFrequencyDictionary.java:75)
 at
org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.hasNext(HighFrequencyDictionary.java:125)
at org.apache.lucene.search.suggest.fst.FSTLookup.build(FSTLookup.java:157)
 at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70)
at org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133)
 at
org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:109)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
 at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
 at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)


I also get an error if after the dataimport command completes, I just exit
the SOLR process and restart it:

Jan 16, 2012 4:06:15 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.fst.NodeHash.rehash(NodeHash.java:158)
at org.apache.lucene.util.fst.NodeHash.add(NodeHash.java:128)
 at org.apache.lucene.util.fst.Builder.compileNode(Builder.java:161)
at org.apache.lucene.util.fst.Builder.compilePrevTail(Builder.java:247)
 at org.apache.lucene.util.fst.Builder.add(Builder.java:364)
at
org.apache.lucene.search.suggest.fst.FSTLookup.buildAutomaton(FSTLookup.java:486)
 at org.apache.lucene.search.suggest.fst.FSTLookup.build(FSTLookup.java:179)
at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70)
 at org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133)
at org.apache.solr.spelling.suggest.Suggester.reload(Suggester.java:153)
 at
org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener.newSearcher(SpellCheckComponent.java:675)
at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1181)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

Jan 16, 2012 4:06:15 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [places] Registered new searcher Searcher@34b0ede5 main



Basically this means once I've run a full-import, I cannot exit the SOLR
process because I receive this error no matter what when I restart the
process. I've tried with different -Xmx arguments, and I'm really at a loss
at this point. Is there any guideline to how much RAM I need? I've got 8GB
on this machine, although that could be increased if necessary. However, I
can't understand why it would need so much memory. Could I have something
configured incorrectly? I've been over the configs several times, trying to
get them down to the bare minimum.

Thanks for any assistance!

Dave

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message