Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D4BB7DB14 for ; Wed, 8 Aug 2012 16:47:53 +0000 (UTC) Received: (qmail 25085 invoked by uid 500); 8 Aug 2012 16:47:50 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 25035 invoked by uid 500); 8 Aug 2012 16:47:50 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 25023 invoked by uid 99); 8 Aug 2012 16:47:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Aug 2012 16:47:50 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cooljam2008@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Aug 2012 16:47:45 +0000 Received: by obbtb18 with SMTP id tb18so1716764obb.35 for ; Wed, 08 Aug 2012 09:47:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=lU5LYHRcmVUKw5C2NFYnWl2SKmOcAbJPAJyYDLdWI3Q=; b=xd0H80J8MwMtWFD1GAlP1JOOib6yryoPFoVIPXuibVuFBVuyRtlq52waeTCSw8sP+r gkvESdShqZ127kqAwdVJODkzs3Kf7G1BaqX6r4/eUO2j1OAcxt0NhJ4XodEGf78a7bsw 044hr+Okn0HZRjxJZBrQ9QgNIJCmpIMLZWccfdxO21ERO6o05qpNuDF53U+4N7iwXXB2 gQLhdpXJrIQYvfJ1DYoSyAZzGSc2Mz/ViGsm3XMo8X9mZ60VSXsdFABNuOv7CYs3xuwR vc+e1j9O99afpGcoy7BgoA+gwhtJIlhmwBF9rWNIaai0+N/XeyfVousTOGa1WFeZX+SA q+qA== MIME-Version: 1.0 Received: by 10.182.50.68 with SMTP id a4mr1409619obo.59.1344444444700; Wed, 08 Aug 2012 09:47:24 -0700 (PDT) Received: by 10.60.12.10 with HTTP; Wed, 8 Aug 2012 09:47:24 -0700 (PDT) In-Reply-To: References: <098EB346-79E8-4D14-B9EE-517A6EFD75B5@gmail.com> Date: Thu, 9 Aug 2012 00:47:24 +0800 Message-ID: Subject: Re: Recovery problem in solrcloud From: Jam Luo To: solr-user@lucene.apache.org, yonik@lucidimagination.com Content-Type: multipart/alternative; boundary=f46d04446bbd8bee2704c6c3ddcb X-Virus-Checked: Checked by ClamAV on apache.org --f46d04446bbd8bee2704c6c3ddcb Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable There are 400 million documents in a shard, a document is less then 1 kb. the data file _**.fdt is 149g. Does the recovering need large memory in downloading or after downloaded? I find some log before OOM as below: Aug 06, 2012 9:43:04 AM org.apache.solr.core.SolrCore execute INFO: [blog] webapp=3D/solr path=3D/select params=3D{sort=3DcreatedAt+desc&distrib=3Dfalse&collection=3Dtoday,blog&hl.= fl=3Dcontent&wt=3Djavabin&hl=3Dfalse&rows=3D10&version=3D2&f.content.hl.fra= gsize=3D0&fl=3Did&shard.url=3Dindex35:8983/solr/blog/&NOW=3D1344217556702&s= tart=3D0&q=3D((("somewordsA"+%26%26+"somewordsB"+%26%26+"somewordsC")+%26%2= 6+platform:abc)+||+id:"/")+%26%26+(createdAt:[2012-07-30T01:43:28.462Z+TO+2= 012-08-06T01:43:28.462Z])&_system=3Dbusiness&isShard=3Dtrue&fsv=3Dtrue&f.ti= tle.hl.fragsize=3D0} hits=3D0 status=3D0 QTime=3D95 Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=3D1 commit{dir=3D/home/ant/jetty/solr/data/index.20120801114027,segFN=3Dsegment= s_aui,generation=3D14058,filenames=3D[_cdnu_nrm.cfs, _cdnu_0.frq, segments_aui, _cdnu.fdt, _cdnu_nrm.cfe, _cdnu_0.tim, _cdnu.fdx, _cdnu.fnm, _cdnu_0.prx, _cdnu_0.tip, _cdnu.per] Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit =3D 14058 Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=3D0,version=3D0,optimize=3Dfalse,openSearcher=3Dtrue,waitSearc= her=3Dtrue,expungeDeletes=3Dfalse,softCommit=3Dfalse} Aug 06, 2012 9:43:05 AM org.apache.solr.search.SolrIndexSearcher INFO: Opening Searcher@13578a09 main Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearche= r INFO: QuerySenderListener sending requests to Searcher@13578a09main{StandardDirectoryReader(segments_aui:1269420 _cdnu(4.0):C457041702)} Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearche= r INFO: QuerySenderListener done. Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrCore registerSearcher INFO: [blog] Registered new searcher Searcher@13578a09main{StandardDirectoryReader(segments_aui:1269420 _cdnu(4.0):C457041702)} Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush Aug 06, 2012 9:43:05 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [blog] webapp=3D/solr path=3D/update params=3D{waitSearcher=3Dtrue&commit_end_point=3Dtrue&wt=3Djavabin&commit= =3Dtrue&version=3D2} {commit=3D} 0 1439 Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=3D0,version=3D0,optimize=3Dfalse,openSearcher=3Dtrue,waitSearc= her=3Dtrue,expungeDeletes=3Dfalse,softCommit=3Dfalse} Aug 06, 2012 9:43:05 AM org.apache.solr.search.SolrIndexSearcher INFO: Opening Searcher@1a630c4d main Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearche= r INFO: QuerySenderListener sending requests to Searcher@1a630c4dmain{StandardDirectoryReader(segments_aui:1269420 _cdnu(4.0):C457041702)} Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearche= r INFO: QuerySenderListener done. Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrCore registerSearcher INFO: [blog] Registered new searcher Searcher@1a630c4dmain{StandardDirectoryReader(segments_aui:1269420 _cdnu(4.0):C457041702)} Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush Aug 06, 2012 9:43:07 AM org.apache.solr.core.SolrCore execute INFO: [blog] webapp=3D/solr path=3D/select params=3D{sort=3DcreatedAt+desc&distrib=3Dfalse&collection=3Dtoday,blog&hl.= fl=3Dcontent&wt=3Djavabin&hl=3Dfalse&rows=3D10&version=3D2&f.content.hl.fra= gsize=3D0&fl=3Did&shard.url=3Dindex35:8983/solr/blog/&NOW=3D1344217558778&s= tart=3D0&_system=3Dbusiness&q=3D(((somewordsD)+%26%26+platform:(abc))+||+id= :"/")+%26%26+(createdAt:[2012-07-30T01:43:30.537Z+TO+2012-08-06T01:43:30.53= 7Z])&isShard=3Dtrue&fsv=3Dtrue&f.title.hl.fragsize=3D0} hits=3D0 status=3D0 QTime=3D490 Except this log, all of other are "path=3D/select ******" in a few minutes, there is no add documents request in this cluster in this time.Is that related to the OOM? This is live traffic, so I can't test it frequently, Tonight I add -XX:+HeapDumpOnOutOfMemoryError option, if this problem appear once again, I will get the heap dump, but I am not sure I can analyse it and get a result. I will ask for your help please. thanks 2012/8/8 Yonik Seeley > Stack trace looks normal - it's just a multi-term query instantiating > a bitset. The memory is being taken up somewhere else. > How many documents are in your index? > Can you get a heap dump or use some other memory profiler to see > what's taking up the space? > > > if I stop query more then ten minutes, the solr instance will start > normally. > > Maybe queries are piling up in threads before the server is ready to > handle them and then trying to handle them all at once gives an OOM? > Is this live traffic or a test? How many concurrent requests get sent? > > -Yonik > http://lucidimagination.com > > > On Wed, Aug 8, 2012 at 2:43 AM, Jam Luo wrote: > > Aug 06, 2012 10:05:55 AM org.apache.solr.common.SolrException log > > SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Ja= va > > heap space > > at > > > org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.j= ava:456) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja= va:284) > > at > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHand= ler.java:1337) > > at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484= ) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:= 119) > > at > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:49= 9) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.j= ava:233) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.j= ava:1065) > > at > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:41= 3) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.ja= va:192) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.ja= va:999) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:= 117) > > at > > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextH= andlerCollection.java:250) > > at > > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollecti= on.java:149) > > at > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.jav= a:111) > > at org.eclipse.jetty.server.Server.handle(Server.java:351) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHtt= pConnection.java:454) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHtt= pConnection.java:47) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConne= ction.java:900) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(Ab= stractHttpConnection.java:954) > > at > org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857) > > at > > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnec= tion.java:66) > > at > > > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(Socket= Connector.java:254) > > at > > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.ja= va:599) > > at > > > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.jav= a:534) > > at java.lang.Thread.run(Thread.java:722) > > Caused by: java.lang.OutOfMemoryError: Java heap space > > at org.apache.lucene.util.FixedBitSet.(FixedBitSet.java:5= 4) > > at > > > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTer= mQueryWrapperFilter.java:104) > > at > > > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(Constan= tScoreQuery.java:129) > > at > > > org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.j= ava:318) > > at > > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507) > > at > > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) > > at > > > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.j= ava:1394) > > at > > > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.ja= va:1269) > > at > > > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:38= 4) > > at > > > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.j= ava:420) > > at > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchH= andler.java:204) > > at > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa= se.java:129) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1544) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.jav= a:442) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja= va:263) > > at > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHand= ler.java:1337) > > at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484= ) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:= 119) > > at > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:49= 9) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.j= ava:233) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.j= ava:1065) > > at > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:41= 3) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.ja= va:192) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.ja= va:999) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:= 117) > > at > > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextH= andlerCollection.java:250) > > at > > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollecti= on.java:149) > > at > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.jav= a:111) > > at org.eclipse.jetty.server.Server.handle(Server.java:351) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHtt= pConnection.java:454) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHtt= pConnection.java:47) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConne= ction.java:900) > > > > This error often appear at the startup, no data write to the index= , > > but it have a lot of query request. if I stop query more then ten > > minutes, the solr instance will start normally. > > My index data in solr data directory is 200g+, RAM is 16g, jvm > > properties is > > -Xmx10g > > -Xss256k > > -Xmn512m > > -XX:+UseCompressedOops > > The OOM and the peer startup fail may be uncorrelated, but this tw= o > > things often happen in the same solr instance and the same time. > > > > I can provide the full log file if you want. > > > > thanks > > > > > > > > > > 2012/8/7 Mark Miller > > > >> Still no idea on the OOM - please send the stacktrace if you can. > >> > >> As for doing a replication recovery when it should not be necessary, > yonik > >> just committed a fix for that a bit ago. > >> > >> On Aug 7, 2012, at 9:41 AM, Mark Miller wrote: > >> > >> > > >> > On Aug 7, 2012, at 5:49 AM, Jam Luo wrote: > >> > > >> >> Hi > >> >> I have big index data files more then 200g, there are two solr > >> >> instance in a shard. leader startup and is ok, but the peer alway > OOM > >> >> when it startup. > >> > > >> > Can you share the OOM msg and stacktrace please? > >> > > >> >> The peer alway download index files from leader because > >> >> of recoveringAfterStartup property in RecoveryStrategy, total time > >> taken > >> >> for download : 2350 secs. if data of the peer is empty, it is ok, > but > >> the > >> >> leader and the peer have a same generation number, why the peer > >> >> do recovering? > >> > > >> > We are looking into this. > >> > > >> >> > >> >> thanks > >> >> cooljam > >> > > >> > - Mark Miller > >> > lucidimagination.com > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > >> - Mark Miller > >> lucidimagination.com > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > --f46d04446bbd8bee2704c6c3ddcb--