lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From KNitin <nitin.t...@gmail.com>
Subject Re: SolrCloud shard marked as down and "reloading" collection doesnt restore it
Date Fri, 12 Feb 2016 00:37:28 GMT
After  more debugging, I figured out that it is related to this:
https://issues.apache.org/jira/browse/SOLR-3274

Is there a recommended fix (apart from running a zk ensemble?)

On Thu, Feb 11, 2016 at 10:29 AM, KNitin <nitin.tnvl@gmail.com> wrote:

> Hi,
>
>  I noticed while running an indexing job (2M docs but per doc size could
> be 2-3 MB) that one of the shards goes down just after the commit.  (Not
> related to OOM or high cpu/load).  This marks the shard as "down" in zk and
> even a reload of the collection does not recover the state.
>
> There are no exceptions in the logs and the stack trace indicates jetty
> threads in blocked state.
>
> The last few lines in the logs are as follows:
>
> trib=TOLEADER&wt=javabin&version=2} {add=[1552605 (1525453861590925312)]}
> 0 5
> INFO  - 2016-02-06 19:17:47.658;
> org.apache.solr.update.DirectUpdateHandler2; start
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> INFO  - 2016-02-06 19:18:02.209; org.apache.solr.core.SolrDeletionPolicy;
> SolrDeletionPolicy.onCommit: commits: num=2
> INFO  - 2016-02-06 19:18:02.209; org.apache.solr.core.SolrDeletionPolicy;
> newest commit generation = 6
> INFO  - 2016-02-06 19:18:02.233; org.apache.solr.search.SolrIndexSearcher;
> Opening Searcher@321a0cc9 main
> INFO  - 2016-02-06 19:18:02.296; org.apache.solr.core.QuerySenderListener;
> QuerySenderListener sending requests to Searcher@321a0cc9
> main{StandardDirectoryReader(segments_6:180:nrt
> _20(4.6):C15155/216:delGen=1 _w(4.6):C1538/63:delGen=2
> _16(4.6):C279/20:delGen=2 _e(4.6):C11386/514:delGen=3
> _g(4.6):C4434/204:delGen=3 _p(4.6):C418/5:delGen=1 _v(4.6):C1
> _x(4.6):C17583/316:delGen=2 _y(4.6):C9783/112:delGen=2
> _z(4.6):C4736/47:delGen=2 _12(4.6):C705/2:delGen=1 _13(4.6):C275/4:delGen=1
> _1b(4.6):C619 _26(4.6):C318/13:delGen=1 _1e(4.6):C25356/763:delGen=3
> _1f(4.6):C13024/426:delGen=2 _1g(4.6):C5368/142:delGen=2
> _1j(4.6):C499/16:delGen=2 _1m(4.6):C448/23:delGen=2
> _1p(4.6):C236/17:delGen=2 _1k(4.6):C173/5:delGen=1
> _1s(4.6):C1082/78:delGen=2 _1t(4.6):C195/17:delGen=2 _1u(4.6):C2
> _21(4.6):C16494/1278:delGen=1 _22(4.6):C5193/398:delGen=1
> _23(4.6):C1361/102:delGen=1 _24(4.6):C475/36:delGen=1
> _29(4.6):C126/11:delGen=1 _2d(4.6):C97/3:delGen=1 _27(4.6):C59/7:delGen=1
> _28(4.6):C26/6:delGen=1 _2b(4.6):C40 _25(4.6):C39/1:delGen=1
> _2c(4.6):C139/9:delGen=1 _2a(4.6):C26/6:delGen=1)}
>
>
> The only solution is to restart the cluster. Why does a reload not work
> and is this a known bug (for which there is a patch i can apply)?
>
> Any pointers are much appreciated
>
> Thanks!
> Nitin
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message