lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: solrcloud used a lot of memory and memory keep increasing during long time run
Date Tue, 22 Dec 2015 05:24:00 GMT
bq: What can we benefit from set maxWarmingSearchers to a larger value

You really don't get _any_ value. That's in there as a safety valve to
prevent run-away resource consumption. Getting this warning in your logs
means you're mis-configuring your system. Increasing the value is almost
totally useless. It simply makes little sense to have your soft commit take
less time than your autowarming, that's a ton of wasted work for no
purpose. It's highly unlikely that your users _really_ need 1.5 second
latency, my bet is 10-15 seconds would be fine. You know best of course,
but this kind of requirement is often something that people _think_ they
need but really don't. It particularly amuses me when the time between when
a document changes and any attempt is made to send it to solr is minutes,
but the product manager insists that "Solr must show the doc within two
seconds of sending it to the index".

It's often actually acceptable for your users to know "it may take up to a
minute for the docs to be searchable". What's usually not acceptable is
unpredictability. But again that's up to your product managers.

bq: You mean if my customer SearchComponent open a searcher, it will exceed the
limit set by maxWarmingSearchers?

Not at all. but if you don't close it properly (it's reference counted),
then more and more searchers will stay open, chewing up memory. So you may
just be failing to close them and seeing memory increase because of that.

Best,
Erick

On Mon, Dec 21, 2015 at 6:47 PM, zhenglingyun <konghuarukhr@163.com> wrote:

> Yes, I do have some custom “Tokenizer"s and “SearchComponent"s.
>
> Here is the screenshot:
>
>
> The number of opened searchers keeps changing. This time it’s 10.
>
> You mean if my customer SearchComponent open a searcher, it will exceed
> the limit set by maxWarmingSearchers? I’ll check that, thanks!
>
> I have to do a short time commit. Our application needs a near real time
> searching
> service. But I’m not sure whether Solr can support NRT search in other
> ways. Can
> you give me some advices?
>
> The value of maxWarmingSearchers is copied from some example configs I
> think,
> I’ll try to set it back to 2.
>
> What can we benefit from set maxWarmingSearchers to a larger value? I
> don't find
> the answer on google and apache-solr-ref-guide.
>
>
>
>
> 在 2015年12月22日,00:34,Erick Erickson <erickerickson@gmail.com> 写道:
>
> Do you have any custom components? Indeed, you shouldn't have
> that many searchers open. But could we see a screenshot? That's
> the best way to insure that we're talking about the same thing.
>
> Your autocommit settings are really hurting you. Your commit interval
> should be as long as you can tolerate. At that kind of commit frequency,
> your caches are of very limited usefulness anyway, so you can pretty
> much shut them off. Every 1.5 seconds, they're invalidated totally.
>
> Upping maxWarmingSearchers is almost always a mistake. That's
> a safety valve that's there in order to prevent runaway resource
> consumption and almost always means the system is mis-configured.
> I'd put it back to 2 and tune the rest of the system to avoid it rather
> than bumping it up.
>
> Best,
> Erick
>
> On Sun, Dec 20, 2015 at 11:43 PM, zhenglingyun <konghuarukhr@163.com>
> wrote:
>
> Just now, I see about 40 "Searchers@XXXX main" displayed in Solr Web UI:
> collection -> Plugins/Stats -> CORE
>
> I think it’s abnormal!
>
> softcommit is set to 1.5s, but warmupTime needs about 3s
> Does it lead to so many Searchers?
>
> maxWarmingSearchers is set to 4 in my solrconfig.xml,
> doesn’t it will prevent Solr from creating more than 4 Searchers?
>
>
>
> 在 2015年12月21日,14:43,zhenglingyun <konghuarukhr@163.com> 写道:
>
> Thanks Erick for pointing out the memory change in a sawtooth pattern.
> The problem troubles me is that the bottom point of the sawtooth keeps
> increasing.
> And when the used capacity of old generation exceeds the threshold set by
> CMS’s
> CMSInitiatingOccupancyFraction, gc keeps running and uses a lot of CPU
> cycle
> but the used old generation memory does not decrease.
>
> After I take Rahul’s advice, I decrease the Xms and Xmx from 16G to 8G, and
> adjust the parameters of JVM from
>   -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>   -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70
>   -XX:+CMSParallelRemarkEnabled
> to
>   -XX:NewRatio=3
>   -XX:SurvivorRatio=4
>   -XX:TargetSurvivorRatio=90
>   -XX:MaxTenuringThreshold=8
>   -XX:+UseConcMarkSweepGC
>   -XX:+UseParNewGC
>   -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4
>   -XX:+CMSScavengeBeforeRemark
>   -XX:PretenureSizeThreshold=64m
>   -XX:+UseCMSInitiatingOccupancyOnly
>   -XX:CMSInitiatingOccupancyFraction=50
>   -XX:CMSMaxAbortablePrecleanTime=6000
>   -XX:+CMSParallelRemarkEnabled
>   -XX:+ParallelRefProcEnabled
>   -XX:-CMSConcurrentMTEnabled
> which is taken from bin/solr.in.sh
> I hope this can reduce gc pause time and full gc times.
> And maybe the memory increasing problem will disappear if I’m lucky.
>
> After several day's running, the memory on one of my two servers increased
> to 90% again…
> (When solr is started, the memory used by solr is less than 1G.)
>
> Following is the output of stat -gccause -h5 <pid> 1000:
>
> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>    LGCC                 GCC
> 9.56   0.00   8.65  91.31  65.89  69379 3076.096 16563 1579.639 4655.735
> Allocation Failure   No GC
> 9.56   0.00  51.10  91.31  65.89  69379 3076.096 16563 1579.639 4655.735
> Allocation Failure   No GC
> 0.00   9.23  10.23  91.35  65.89  69380 3076.135 16563 1579.639 4655.774
> Allocation Failure   No GC
> 7.90   0.00   9.74  91.39  65.89  69381 3076.165 16564 1579.683 4655.848
> CMS Final Remark     No GC
> 7.90   0.00  67.45  91.39  65.89  69381 3076.165 16564 1579.683 4655.848
> CMS Final Remark     No GC
> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>    LGCC                 GCC
> 0.00   7.48  16.18  91.41  65.89  69382 3076.200 16565 1579.707 4655.908
> CMS Initial Mark     No GC
> 0.00   7.48  73.77  91.41  65.89  69382 3076.200 16565 1579.707 4655.908
> CMS Initial Mark     No GC
> 8.61   0.00  29.86  91.45  65.89  69383 3076.228 16565 1579.707 4655.936
> Allocation Failure   No GC
> 8.61   0.00  90.16  91.45  65.89  69383 3076.228 16565 1579.707 4655.936
> Allocation Failure   No GC
> 0.00   7.46  47.89  91.46  65.89  69384 3076.258 16565 1579.707 4655.966
> Allocation Failure   No GC
> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>    LGCC                 GCC
> 8.67   0.00  11.98  91.49  65.89  69385 3076.287 16565 1579.707 4655.995
> Allocation Failure   No GC
> 0.00  11.76   9.24  91.54  65.89  69386 3076.321 16566 1579.759 4656.081
> CMS Final Remark     No GC
> 0.00  11.76  64.53  91.54  65.89  69386 3076.321 16566 1579.759 4656.081
> CMS Final Remark     No GC
> 7.25   0.00  20.39  91.57  65.89  69387 3076.358 16567 1579.786 4656.144
> CMS Initial Mark     No GC
> 7.25   0.00  81.56  91.57  65.89  69387 3076.358 16567 1579.786 4656.144
> CMS Initial Mark     No GC
> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>    LGCC                 GCC
> 0.00   8.05  34.42  91.60  65.89  69388 3076.391 16567 1579.786 4656.177
> Allocation Failure   No GC
> 0.00   8.05  84.17  91.60  65.89  69388 3076.391 16567 1579.786 4656.177
> Allocation Failure   No GC
> 8.54   0.00  55.14  91.62  65.89  69389 3076.420 16567 1579.786 4656.205
> Allocation Failure   No GC
> 0.00   7.74  12.42  91.66  65.89  69390 3076.456 16567 1579.786 4656.242
> Allocation Failure   No GC
> 9.60   0.00  11.00  91.70  65.89  69391 3076.492 16568 1579.841 4656.333
> CMS Final Remark     No GC
> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>    LGCC                 GCC
> 9.60   0.00  69.24  91.70  65.89  69391 3076.492 16568 1579.841 4656.333
> CMS Final Remark     No GC
> 0.00   8.70  18.21  91.74  65.89  69392 3076.529 16569 1579.870 4656.400
> CMS Initial Mark     No GC
> 0.00   8.70  61.92  91.74  65.89  69392 3076.529 16569 1579.870 4656.400
> CMS Initial Mark     No GC
> 7.36   0.00   3.49  91.77  65.89  69393 3076.570 16569 1579.870 4656.440
> Allocation Failure   No GC
> 7.36   0.00  42.03  91.77  65.89  69393 3076.570 16569 1579.870 4656.440
> Allocation Failure   No GC
> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>    LGCC                 GCC
> 0.00   9.77   0.00  91.80  65.89  69394 3076.604 16569 1579.870 4656.475
> Allocation Failure   No GC
> 9.08   0.00   9.92  91.82  65.89  69395 3076.632 16570 1579.913 4656.545
> CMS Final Remark     No GC
> 9.08   0.00  58.90  91.82  65.89  69395 3076.632 16570 1579.913 4656.545
> CMS Final Remark     No GC
> 0.00   8.44  16.20  91.86  65.89  69396 3076.664 16571 1579.930 4656.594
> CMS Initial Mark     No GC
> 0.00   8.44  71.95  91.86  65.89  69396 3076.664 16571 1579.930 4656.594
> CMS Initial Mark     No GC
> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>    LGCC                 GCC
> 8.11   0.00  30.59  91.90  65.89  69397 3076.694 16571 1579.930 4656.624
> Allocation Failure   No GC
> 8.11   0.00  93.41  91.90  65.89  69397 3076.694 16571 1579.930 4656.624
> Allocation Failure   No GC
> 0.00   9.77  57.34  91.96  65.89  69398 3076.724 16571 1579.930 4656.654
> Allocation Failure   No GC
>
> Full gc seems can’t free any garbage any more (Or the garbage produced is
> as fast as gc freed?)
> On the other hand, another replication of the collection on another
> server(the collection has two replications)
> uses 40% of old generation memory, and doesn’t trigger so many full gc.
>
>
> Following is the output of eclipse MAT leak suspects:
>
> Problem Suspect 1
>
> 4,741 instances of "org.apache.lucene.index.SegmentCoreReaders", loaded by
> "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy
> 3,743,067,520 (64.12%) bytes. These instances are referenced from one
> instance of "java.lang.Object[]", loaded by "<system class loader>"
>
> Keywords
> java.lang.Object[]
> org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
> org.apache.lucene.index.SegmentCoreReaders
>
> Details »
> Problem Suspect 2
>
> 2,815 instances of "org.apache.lucene.index.StandardDirectoryReader",
> loaded by "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978"
> occupy 970,614,912 (16.63%) bytes. These instances are referenced from one
> instance of "java.lang.Object[]", loaded by "<system class loader>"
>
> Keywords
> java.lang.Object[]
> org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
> org.apache.lucene.index.StandardDirectoryReader
>
> Details »
>
>
>
> Class structure in above “Details":
>
> java.lang.Thread @XXX
>   <Java Local> java.util.ArrayList @XXXX
>       elementData java.lang.Object[3141] @XXXX
>           org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>           org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>           org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>           …
> a lot of org.apache.lucene.search.FieldCache$CacheEntry (1205 in Suspect
> 1, 2785 in Suspect 2)
>
> Does these lots of org.apache.lucene.search.FieldCache$CacheEntry normal?
>
> Thanks.
>
>
>
>
> 在 2015年12月16日,00:44,Erick Erickson <erickerickson@gmail.com> 写道:
>
> Rahul's comments were spot on. You can gain more confidence that this
> is normal if if you try attaching a memory reporting program (jconsole
> is one) you'll see the memory grow for quite a while, then garbage
> collection kicks in and you'll see it drop in a sawtooth pattern.
>
> Best,
> Erick
>
> On Tue, Dec 15, 2015 at 8:19 AM, zhenglingyun <konghuarukhr@163.com>
> wrote:
>
> Thank you very much.
> I will try reduce the heap memory and check if the memory still keep
> increasing or not.
>
> 在 2015年12月15日,19:37,Rahul Ramesh <rr.iiitb@gmail.com> 写道:
>
> You should actually decrease solr heap size. Let me explain a bit.
>
> Solr requires very less heap memory for its operation and more memory for
> storing data in main memory. This is because solr uses mmap for storing the
> index files.
> Please check the link
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> for
> understanding how solr operates on files .
>
> Solr has typical problem of Garbage collection once you the heap size to a
> large value. It will have indeterminate pauses due to GC. The amount of
> heap memory required is difficult to tell. However the way we tuned this
> parameter is setting it to a low value and increasing it by 1Gb whenever
> OOM is thrown.
>
> Please check the problem of having large Java Heap
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>
>
> Just for your reference, in our production setup, we have data of around
> 60Gb/node spread across 25 collections. We have configured 8GB as heap and
> the rest of the memory we will leave it to OS to manage. We do around 1000
> (search + Insert)/second on the data.
>
> I hope this helps.
>
> Regards,
> Rahul
>
>
>
> On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun <konghuarukhr@163.com>
> wrote:
>
> Hi, list
>
> I’m new to solr. Recently I encounter a “memory leak” problem with
> solrcloud.
>
> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I
> have
> one collection with about 400k docs. The index size of the collection is
> about
> 500MB. Memory for solr is 16GB.
>
> Following is "ps aux | grep solr” :
>
> /usr/java/jdk1.7.0_67-cloudera/bin/java
>
> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
> -Dsolr.hdfs.blockcache.blocksperbank=16384
> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
> -Xloggc:/var/log/solr/gc.log
> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh
> -DzkHost=
> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>
> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
> -Dsolr.authentication.simple.anonymous.allowed=true
> -Dsolr.security.proxyuser.hue.hosts=*
> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>
> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
> -Dsolr.hdfs.blockcache.blocksperbank=16384
> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
> -Xloggc:/var/log/solr/gc.log
> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh
> -DzkHost=
> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>
> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
> -Dsolr.authentication.simple.anonymous.allowed=true
> -Dsolr.security.proxyuser.hue.hosts=*
> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>
> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
> -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath
> /usr/lib/bigtop-tomcat/bin/bootstrap.jar
> -Dcatalina.base=/var/lib/solr/tomcat-deployment
> -Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/
> org.apache.catalina.startup.Bootstrap start
>
>
> solr version is solr4.4.0-cdh5.3.0
> jdk version is 1.7.0_67
>
> Soft commit time is 1.5s. And we have real time indexing/partialupdating
> rate about 100 docs per second.
>
> When fresh started, Solr will use about 500M memory(the memory show in
> solr ui panel).
> After several days running, Solr will meet with long time gc problems, and
> no response to user query.
>
> During solr running, the memory used by solr is keep increasing until some
> large value, and decrease to
> a low level(because of gc), and keep increasing until a larger value
> again, then decrease to a low level again … and keep
> increasing to an more larger value … until solr has no response and i
> restart it.
>
>
> I don’t know how to solve this problem. Can you give me some advices?
>
> Thanks.
>
>
>
>
>
>
>
>
>
>
>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message