lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Burton-West <tburt...@umich.edu>
Subject Re: Estimating peak memory use for UnInvertedField faceting
Date Mon, 11 Nov 2013 16:33:46 GMT
Thanks Otis,

 I'm looking forward to the presentation videos.

I'll look into using DocValues.    Re-indexing 200 million docs will take a
while though :).
Will Solr automatically use DocValues for faceting if you have DocValues
for the field or is there some configuration or parameter that needs to be
set?

Tom


On Sat, Nov 9, 2013 at 9:57 AM, Otis Gospodnetic <otis.gospodnetic@gmail.com
> wrote:

> Hi Tom,
>
> Check http://blog.sematext.com/2013/11/09/presentation-solr-for-analytics/
> .  It includes info about our experiment with DocValues, which clearly
> shows lower heap usage, which means you'll get further without getting
> this OOM.  In our experiments we didn't sort, facet, or group, and I
> see you are faceting, which means that DocValues, which are more
> efficient than FieldCache, should help you even more than it helped
> us.
>
> The graphs are from SPM, which you could use to monitor your Solr
> cluster, at least while you are tuning it.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Fri, Nov 8, 2013 at 2:41 PM, Tom Burton-West <tburtonw@umich.edu>
> wrote:
> > Hi Yonik,
> >
> > I don't know enough about JVM tuning and monitoring to do this in a clean
> > way, so I just tried setting the max heap at 8GB and then 6GB to force
> > garbage collection.  With it set to 6GB it goes into  a long GC loop and
> > then runs out of heap (See below) .  The stack trace says the issue is
> with
> > DocTErmOrds.uninvert:
> > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> > at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405)
> >
> >  I'm guessing the actual peak is somewhere between 6 and 8 GB.
> >
> > BTW: is there some documentation somewhere that explains what the stats
> > output to INFO mean?
> >
> > Tom
> >
> >
> > java.lang.OutOfMemoryError: GC overhead limit exceeded</str><str
> > name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: GC
> > overhead limit exceeded
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:653)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:366)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
> > at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
> > at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
> > at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
> > at
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
> > at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548)
> > at
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> > at
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
> > at
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
> > at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
> > at
> >
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
> > at
> >
> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
> > at
> >
> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
> > at
> >
> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
> > at
> >
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
> > at java.lang.Thread.run(Thread.java:724)
> > Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> > at org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:405)
> > at
> org.apache.solr.request.UnInvertedField.<init>(UnInvertedField.java:179)
> > at
> >
> org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:664)
> > at
> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:426)
> > at
> >
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:517)
> > at
> >
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:252)
> > at
> >
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78)
> > at
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
> > at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> > ... 16 more
> > </str>
> >
> > ---
> > Nov 08, 2013 1:39:26 PM org.apache.solr.request.UnInvertedField <init>
> > INFO: UnInverted multi-valued field {field=topicStr,
> > memSize=1,768,101,824,
> > tindexSize=86,028,
> > time=45,854,
> > phase1=41,039,
> > nTerms=271,987,
> > bigTerms=0,
> > termInstances=569,429,716,
> > uses=0}
> > Nov 08, 2013 1:39:28 PM org.apache.solr.core.SolrCore execute
> >
> > INFO: [core] webapp=/dev-3 path=/select
> >
> params={facet=true&facet.mincount=100&indent=true&q=ocr:the&facet.limit=30&facet.field=topicStr&wt=xml}
> > hits=138,605,690 status=0 QTime=49,797
> >
> >
> >
> > On Fri, Nov 8, 2013 at 2:01 PM, Yonik Seeley <yonik@heliosearch.com>
> wrote:
> >>
> >> On Fri, Nov 8, 2013 at 1:56 PM, Tom Burton-West <tburtonw@umich.edu>
> >> wrote:
> >> > When testing an index of about 200 million documents, when we do a
> first
> >> > faceting on one field (query appended below), the memory use rises
> from
> >> > about 2.5 GB to 13GB.  If I run GC after the query the memory use goes
> >> > down
> >> > to about 3GB and subsequent queries don't significantly increase the
> >> > memory
> >> > use.
> >>
> >> Is there a way to tell what the real max memory usage is?  I assume
> >> 13GB is just the peak heap usage, but that could include a lot of
> >> garbage.
> >>
> >> -Yonik
> >> http://heliosearch.com -- making solr shine
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: dev-help@lucene.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message