lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Nodes cannot recover and become unavailable
Date Wed, 19 Sep 2012 20:25:40 GMT
bq. I believe there were some changes made to the clusterstate.json
recently that are not backwards compatible.

Indeed - I think yonik committed something the other day - we prob
should send an email out about this. Not sure exactly how easy an
upgrade is or what steps to take - it may be something like stop your
whole cluster, delete clusterstate.json and then it works, or it may
take more or less than that - if that's the issue here, i don't know,
but it's likely an issue.

On Wed, Sep 19, 2012 at 8:41 AM, Sami Siren <ssiren@gmail.com> wrote:
> also, did you re create the cluster after upgrading to a newer
> version? I believe there were some changes made to the
> clusterstate.json recently that are not backwards compatible.
>
> --
>  Sami Siren
>
>
>
> On Wed, Sep 19, 2012 at 6:21 PM, Sami Siren <ssiren@gmail.com> wrote:
>> Hi,
>>
>> I am having troubles understanding the reason for that NPE.
>>
>> First you could try removing the line #102 in HttpClientUtility so
>> that logging does not prevent creation of the http client in
>> SyncStrategy.
>>
>> --
>>  Sami Siren
>>
>> On Wed, Sep 19, 2012 at 5:29 PM, Markus Jelsma
>> <markus.jelsma@openindex.io> wrote:
>>> Hi,
>>>
>>> Since the 2012-09-17 11:10:41 build shards start to have trouble coming back
online. When i restart one node the slices on the other nodes are throwing exceptions and
cannot be queried. I'm not sure how to remedy the problem but stopping a node or restarting
it a few times seems to help it. The problem is when i restart a node, and it happens, i must
not restart another node because that may trigger other slices becoming unavailable.
>>>
>>> Here are some parts of the log:
>>>
>>> 2012-09-19 14:13:18,149 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread]
- : Recovery failed - trying again... core=oi_i
>>> 2012-09-19 14:13:25,818 WARN [solr.cloud.RecoveryStrategy] - [main-EventThread]
- : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
>>> 2012-09-19 14:13:44,497 WARN [solr.cloud.RecoveryStrategy] - [Thread-4] - : Stopping
recovery for zkNodeName=nl10.host:8080_solr_oi_jcore=oi_j
>>> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread]
- : Error while trying to recover. core=oi_i:org.apache.solr.common.SolrException: We are
not the leader
>>>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402)
>>>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
>>>         at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
>>>         at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
>>>         at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
>>>
>>> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread]
- : Recovery failed - trying again... core=oi_i
>>> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread]
- : Recovery failed - max retries exceeded. core=oi_i
>>> 2012-09-19 14:14:00,321 ERROR [solr.cloud.RecoveryStrategy] - [RecoveryThread]
- : Recovery failed - I give up. core=oi_i
>>> 2012-09-19 14:14:00,333 WARN [solr.cloud.RecoveryStrategy] - [RecoveryThread]
- : Stopping recovery for zkNodeName=nl10.host:8080_solr_oi_icore=oi_i
>>>  ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : Sync request error:
java.lang.NullPointerException
>>>  ERROR [solr.cloud.SyncStrategy] - [main-EventThread] - : http://nl10.host:8080/solr/oi_i/:
Could not tell a replica to recover:java.lang.NullPointerException
>>>         at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
>>>         at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:102)
>>>         at org.apache.solr.client.solrj.impl.HttpSolrServer.<init>(HttpSolrServer.java:155)
>>>         at org.apache.solr.client.solrj.impl.HttpSolrServer.<init>(HttpSolrServer.java:128)
>>>         at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:262)
>>>         at org.apache.solr.cloud.SyncStrategy.requestRecovery(SyncStrategy.java:272)
>>>         at org.apache.solr.cloud.SyncStrategy.syncToMe(SyncStrategy.java:203)
>>>         at org.apache.solr.cloud.SyncStrategy.syncReplicas(SyncStrategy.java:125)
>>>         at org.apache.solr.cloud.SyncStrategy.sync(SyncStrategy.java:87)
>>>         at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:169)
>>>         at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
>>>         at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
>>>         at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:275)
>>>         at org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:326)
>>>         at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:159)
>>>         at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:158)
>>>         at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:102)
>>>         at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:56)
>>>         at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:131)
>>>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
>>>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
>>>
>>>  ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while calling
watcher
>>> java.lang.NullPointerException
>>>         at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:139)
>>>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
>>>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
>>>  ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while calling
watcher
>>> java.lang.NullPointerException
>>>         at org.apache.solr.common.cloud.ZkStateReader$3.process(ZkStateReader.java:238)
>>>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
>>>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
>>>  ERROR [apache.zookeeper.ClientCnxn] - [main-EventThread] - : Error while calling
watcher
>>> java.lang.NullPointerException
>>>         at org.apache.solr.common.cloud.ZkStateReader$2.process(ZkStateReader.java:189)
>>>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
>>>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
>>> 2012-09-19 14:14:05,304 WARN [solr.core.CoreContainer] - [main] - : Log watching
is not yet implemented for log4j
>>> 2012-09-19 14:14:08,504 WARN [solr.core.SolrCore] - [main] - : New index directory
detected: old=null new=/opt/solr/cores/oi_j/data/index.20120823134824608
>>> 2012-09-19 14:14:10,895 WARN [solr.core.SolrCore] - [main] - : New index directory
detected: old=null new=/opt/solr/cores/oi_i/data/index/
>>> 2012-09-19 14:14:41,203 ERROR [solr.cloud.ZkController] - [main] - : Error getting
leader from zk
>>> org.apache.solr.common.SolrException: Could not get leader props
>>>         at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:762)
>>>         at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:722)
>>>         at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:687)
>>>         at org.apache.solr.cloud.ZkController.register(ZkController.java:626)
>>>         at org.apache.solr.cloud.ZkController.register(ZkController.java:576)
>>>         at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:721)
>>>         at org.apache.solr.core.CoreContainer.register(CoreContainer.java:705)
>>>         at org.apache.solr.core.CoreContainer.load(CoreContainer.java:547)
>>>         at org.apache.solr.core.CoreContainer.load(CoreContainer.java:365)
>>>         at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:314)
>>>         at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
>>>         at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
>>>         at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
>>>         at org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115)
>>>         at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
>>>         at org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
>>>         at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
>>>         at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
>>>         at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
>>>         at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
>>>         at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
>>>         at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
>>>         at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317)
>>>         at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
>>>         at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
>>>         at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
>>>         at org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
>>>         at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057)
>>>         at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
>>>         at org.apache.catalina.core.StandardService.start(StandardService.java:525)
>>>         at org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
>>>         at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:616)
>>>         at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
>>>         at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
= NoNode for /collections/oi/leaders/shard9
>>>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>>>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
>>>         at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:301)
>>>         at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:298)
>>>         at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:67)
>>>         at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:298)
>>>         at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:740)
>>>         ... 37 more
>>> 2012-09-19 14:14:41,239 ERROR [solr.core.CoreContainer] - [main] - : :org.apache.solr.common.SolrException:
Error getting leader from zk
>>>         at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:711)
>>>         at org.apache.solr.cloud.ZkController.register(ZkController.java:626)
>>>         at org.apache.solr.cloud.ZkController.register(ZkController.java:576)
>>>         at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:721)
>>>         at org.apache.solr.core.CoreContainer.register(CoreContainer.java:705)
>>>         at org.apache.solr.core.CoreContainer.load(CoreContainer.java:547)
>>>         at org.apache.solr.core.CoreContainer.load(CoreContainer.java:365)
>>>         at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:314)
>>>         at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
>>>         at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
>>>         at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
>>>         at org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115)
>>>         at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
>>>         at org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
>>>         at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
>>>         at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
>>>         at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
>>>         at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
>>>         at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
>>>         at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
>>>         at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317)
>>>         at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
>>>         at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
>>>         at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
>>>         at org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
>>>         at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057)
>>>         at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
>>>         at org.apache.catalina.core.StandardService.start(StandardService.java:525)
>>>         at org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
>>>         at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:616)
>>>         at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
>>>         at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
>>> Caused by: org.apache.solr.common.SolrException: Could not get leader props
>>>         at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:762)
>>>         at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:722)
>>>         at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:687)
>>>         ... 35 more
>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
= NoNode for /collections/oi/leaders/shard9
>>>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>>>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
>>>         at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:301)
>>>         at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:298)
>>>         at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:67)
>>>         at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:298)
>>>         at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:740)
>>>         ... 37 more
>>> 2012-09-19 14:14:41,239 ERROR [solr.core.CoreContainer] - [main] - : null:org.apache.solr.common.cloud.ZooKeeperException:
>>>         at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:744)
>>>         at org.apache.solr.core.CoreContainer.register(CoreContainer.java:705)
>>>         at org.apache.solr.core.CoreContainer.load(CoreContainer.java:547)
>>>         at org.apache.solr.core.CoreContainer.load(CoreContainer.java:365)
>>>         at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:314)
>>>         at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
>>>         at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
>>>         at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
>>>         at org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115)
>>>         at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
>>>         at org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
>>>         at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
>>>         at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
>>>         at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
>>>         at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
>>>         at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
>>>         at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
>>>         at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317)
>>>         at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
>>>         at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
>>>         at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
>>>         at org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
>>>         at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057)
>>>         at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
>>>         at org.apache.catalina.core.StandardService.start(StandardService.java:525)
>>>         at org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
>>>         at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:616)
>>>         at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
>>>         at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
>>> Caused by: org.apache.solr.common.SolrException: Error getting leader from zk
>>>         at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:711)
>>>         at org.apache.solr.cloud.ZkController.register(ZkController.java:626)
>>>         at org.apache.solr.cloud.ZkController.register(ZkController.java:576)
>>>         at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:721)
>>>         ... 32 more
>>> Caused by: org.apache.solr.common.SolrException: Could not get leader props
>>>         at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:762)
>>>         at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:722)
>>>         at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:687)
>>>         ... 35 more
>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
= NoNode for /collections/oi/leaders/shard9
>>>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>>>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
>>>         at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:301)
>>>         at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:298)
>>>         at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:67)
>>>         at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:298)
>>>         at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:740)
>>>         ... 37 more
>>>
>>> This does not happen with an older build of september 11th.
>>>
>>> Thanks,
>>> Markus



-- 
- Mark

Mime
View raw message