lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jam Luo <cooljam2...@gmail.com>
Subject Re: Recovery problem in solrcloud
Date Wed, 08 Aug 2012 16:47:24 GMT
There are 400 million documents in a shard, a document is less then 1 kb.
the data file _**.fdt is 149g.
Does the recovering need large memory in downloading or after downloaded?

I find some log before OOM as below:
Aug 06, 2012 9:43:04 AM org.apache.solr.core.SolrCore execute
INFO: [blog] webapp=/solr path=/select
params={sort=createdAt+desc&distrib=false&collection=today,blog&hl.fl=content&wt=javabin&hl=false&rows=10&version=2&f.content.hl.fragsize=0&fl=id&shard.url=index35:8983/solr/blog/&NOW=1344217556702&start=0&q=((("somewordsA"+%26%26+"somewordsB"+%26%26+"somewordsC")+%26%26+platform:abc)+||+id:"/")+%26%26+(createdAt:[2012-07-30T01:43:28.462Z+TO+2012-08-06T01:43:28.462Z])&_system=business&isShard=true&fsv=true&f.title.hl.fragsize=0}
hits=0 status=0 QTime=95
Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1

commit{dir=/home/ant/jetty/solr/data/index.20120801114027,segFN=segments_aui,generation=14058,filenames=[_cdnu_nrm.cfs,
_cdnu_0.frq, segments_aui, _cdnu.fdt, _cdnu_nrm.cfe, _cdnu_0.tim,
_cdnu.fdx, _cdnu.fnm, _cdnu_0.prx, _cdnu_0.tip, _cdnu.per]
Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 14058
Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,version=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
Aug 06, 2012 9:43:05 AM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening Searcher@13578a09 main
Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to
Searcher@13578a09main{StandardDirectoryReader(segments_aui:1269420
_cdnu(4.0):C457041702)}
Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrCore registerSearcher
INFO: [blog] Registered new searcher
Searcher@13578a09main{StandardDirectoryReader(segments_aui:1269420
_cdnu(4.0):C457041702)}
Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Aug 06, 2012 9:43:05 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [blog] webapp=/solr path=/update
params={waitSearcher=true&commit_end_point=true&wt=javabin&commit=true&version=2}
{commit=} 0 1439
Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,version=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
Aug 06, 2012 9:43:05 AM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening Searcher@1a630c4d main
Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to
Searcher@1a630c4dmain{StandardDirectoryReader(segments_aui:1269420
_cdnu(4.0):C457041702)}
Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrCore registerSearcher
INFO: [blog] Registered new searcher
Searcher@1a630c4dmain{StandardDirectoryReader(segments_aui:1269420
_cdnu(4.0):C457041702)}
Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Aug 06, 2012 9:43:07 AM org.apache.solr.core.SolrCore execute
INFO: [blog] webapp=/solr path=/select
params={sort=createdAt+desc&distrib=false&collection=today,blog&hl.fl=content&wt=javabin&hl=false&rows=10&version=2&f.content.hl.fragsize=0&fl=id&shard.url=index35:8983/solr/blog/&NOW=1344217558778&start=0&_system=business&q=(((somewordsD)+%26%26+platform:(abc))+||+id:"/")+%26%26+(createdAt:[2012-07-30T01:43:30.537Z+TO+2012-08-06T01:43:30.537Z])&isShard=true&fsv=true&f.title.hl.fragsize=0}
hits=0 status=0 QTime=490

Except this log, all of other are "path=/select ******" in a few minutes,
there is no add documents request in this cluster in this time.Is
that related to the OOM?

This is live traffic, so I can't test it frequently, Tonight I add
-XX:+HeapDumpOnOutOfMemoryError
option, if this problem appear once again, I will get the  heap dump, but I
am not sure I can analyse it and get a result. I will ask for your help
please.

thanks

2012/8/8 Yonik Seeley <yonik@lucidimagination.com>

> Stack trace looks normal - it's just a multi-term query instantiating
> a bitset.  The memory is being taken up somewhere else.
> How many documents are in your index?
> Can you get a heap dump or use some other memory profiler to see
> what's taking up the space?
>
> > if I stop query more then  ten minutes, the solr instance will start
> normally.
>
> Maybe queries are piling up in threads before the server is ready to
> handle them and then trying to handle them all at once gives an OOM?
> Is this live traffic or a test?  How many concurrent requests get sent?
>
> -Yonik
> http://lucidimagination.com
>
>
> On Wed, Aug 8, 2012 at 2:43 AM, Jam Luo <cooljam2008@gmail.com> wrote:
> > Aug 06, 2012 10:05:55 AM org.apache.solr.common.SolrException log
> > SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java
> > heap space
> >         at
> >
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)
> >         at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284)
> >         at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> >         at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> >         at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> >         at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499)
> >         at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> >         at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> >         at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> >         at
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> >         at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> >         at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
> >         at
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
> >         at
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
> >         at
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
> >         at org.eclipse.jetty.server.Server.handle(Server.java:351)
> >         at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
> >         at
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
> >         at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
> >         at
> >
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
> >         at
> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
> >         at
> > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> >         at
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
> >         at
> >
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
> >         at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
> >         at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
> >         at java.lang.Thread.run(Thread.java:722)
> > Caused by: java.lang.OutOfMemoryError: Java heap space
> >         at org.apache.lucene.util.FixedBitSet.<init>(FixedBitSet.java:54)
> >         at
> >
> org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104)
> >         at
> >
> org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:129)
> >         at
> >
> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318)
> >         at
> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507)
> >         at
> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> >         at
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1394)
> >         at
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1269)
> >         at
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:384)
> >         at
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:420)
> >         at
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
> >         at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> >         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1544)
> >         at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
> >         at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
> >         at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> >         at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> >         at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> >         at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499)
> >         at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> >         at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> >         at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> >         at
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> >         at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> >         at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
> >         at
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
> >         at
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
> >         at
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
> >         at org.eclipse.jetty.server.Server.handle(Server.java:351)
> >         at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
> >         at
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
> >         at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
> >
> >     This error often appear  at the startup, no data write to the index,
> > but  it have a lot of query request. if I stop query more then  ten
> > minutes, the solr instance will start normally.
> >     My index data in solr data directory  is 200g+,  RAM is 16g, jvm
> > properties is
> >  -Xmx10g
> >  -Xss256k
> >  -Xmn512m
> >  -XX:+UseCompressedOops
> >     The OOM and the peer startup fail may be uncorrelated,  but this two
> > things often happen in the same solr instance and the same time.
> >
> >     I can provide the full log file if you want.
> >
> > thanks
> >
> >
> >
> >
> > 2012/8/7 Mark Miller <markrmiller@gmail.com>
> >
> >> Still no idea on the OOM - please send the stacktrace if you can.
> >>
> >> As for doing a replication recovery when it should not be necessary,
> yonik
> >> just committed a fix for that a bit ago.
> >>
> >> On Aug 7, 2012, at 9:41 AM, Mark Miller <markrmiller@gmail.com> wrote:
> >>
> >> >
> >> > On Aug 7, 2012, at 5:49 AM, Jam Luo <cooljam2008@gmail.com> wrote:
> >> >
> >> >> Hi
> >> >>   I have  big index data files  more then 200g, there are two solr
> >> >> instance in a shard.  leader startup and is ok, but the peer alway
> OOM
> >> >> when  it startup.
> >> >
> >> > Can you share the OOM msg and stacktrace please?
> >> >
> >> >> The peer alway download index files from leader because
> >> >> of  recoveringAfterStartup property in RecoveryStrategy, total time
> >> taken
> >> >> for download : 2350 secs.  if  data of the peer is empty, it is ok,
> but
> >> the
> >> >> leader and the peer have a same generation number,  why the peer
> >> >> do recovering?
> >> >
> >> > We are looking into this.
> >> >
> >> >>
> >> >> thanks
> >> >> cooljam
> >> >
> >> > - Mark Miller
> >> > lucidimagination.com
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >> - Mark Miller
> >> lucidimagination.com
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message