lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject 6.5.1. cloud went partially down
Date Mon, 08 May 2017 09:35:00 GMT
Hi,

Multiple 6.5.1. clouds / collections went down this weekend around the same time, they share
the same ZK quorum. The nodes stayed up but did not rejoin the cluster (find or connect to
ZK)

This is what the log told us:

2017-05-06 18:58:34.893 WARN  (zkCallback-5-thread-9-processing-n:idx6.example.org:8983_solr)
[   ] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@4f97bdad
name: ZooKe
eperConnection Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
got event WatchedEvent state:Disconnected type:None path:null path: null type: None
2017-05-06 18:58:34.893 WARN  (zkCallback-5-thread-9-processing-n:idx6.example.org:8983_solr)
[   ] o.a.s.c.c.ConnectionManager zkClient has disconnected
2017-05-06 18:58:35.001 WARN  (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search s:shard2 r:core_node6
x:search_shard2_replica3] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@c226cc
name: ZooKeeperConnection Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
got event WatchedEvent state:Disconnected type:None path:null path: null type: None
2017-05-06 18:58:35.010 WARN  (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search s:shard2 r:core_node6
x:search_shard2_replica3] o.a.s.c.c.ConnectionManager zkClient has disconnected
2017-05-06 18:58:45.360 WARN  (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr)
[   ] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@4f97bdad
name: ZooKeeperConnection Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
got event WatchedEvent state:Expired type:None path:null path: null type: None
2017-05-06 18:58:45.360 WARN  (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr)
[   ] o.a.s.c.c.ConnectionManager Our previous ZooKeeper session was expired. Attempting to
reconnect to recover relationship with ZooKeeper...
2017-05-06 18:58:45.380 WARN  (OverseerStateUpdate-97740792370385619-idx6.example.org:8983_solr-n_0000000558)
[   ] o.a.s.c.Overseer Solr cannot talk to ZK, exiting Overseer main queue loop
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
for /overseer/queue
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
        at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:339)
        at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:336)
        at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
        at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:336)
        at org.apache.solr.cloud.DistributedQueue.fetchZkChildren(DistributedQueue.java:308)
        at org.apache.solr.cloud.DistributedQueue.firstChild(DistributedQueue.java:285)
        at org.apache.solr.cloud.DistributedQueue.firstElement(DistributedQueue.java:393)
        at org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:159)
        at org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:137)
        at org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:180)
        at java.lang.Thread.run(Thread.java:745)
2017-05-06 18:58:45.381 WARN  (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr)
[   ] o.a.s.c.c.DefaultConnectionStrategy Connection expired - starting a new one...
2017-05-06 18:58:45.382 ERROR (OverseerExitThread) [   ] o.a.s.c.Overseer could not read the
data
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
for /overseer_elect/leader
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
        at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356)
        at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353)
        at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
        at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353)
        at org.apache.solr.cloud.Overseer$ClusterStateUpdater.checkIfIamStillLeader(Overseer.java:287)
        at java.lang.Thread.run(Thread.java:745)
2017-05-06 18:58:46.453 WARN  (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search s:shard2 r:core_node6
x:search_shard2_replica3] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@c226cc
name: ZooKeeperConnection Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
got event WatchedEvent state:Expired type:None path:null path: null type: None
2017-05-06 18:58:46.453 WARN  (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search s:shard2 r:core_node6
x:search_shard2_replica3] o.a.s.c.c.ConnectionManager Our previous ZooKeeper session was expired.
Attempting to reconnect to recover relationship with ZooKeeper...
2017-05-06 18:58:46.460 WARN  (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search s:shard2 r:core_node6
x:search_shard2_replica3] o.a.s.c.c.DefaultConnectionStrategy Connection expired - starting
a new one...
2017-05-06 18:58:53.599 ERROR (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr)
[   ] o.a.s.c.ZkController :org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode
= NodeExists for /live_nodes/idx6.example.org:8983_solr
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
        at org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:526)
        at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
        at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:523)
        at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:466)
        at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:453)
        at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:430)
        at org.apache.solr.cloud.ZkController.createEphemeralLiveNode(ZkController.java:823)
        at org.apache.solr.cloud.ZkController.access$600(ZkController.java:120)
        at org.apache.solr.cloud.ZkController$1.command(ZkController.java:340)
        at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:168)
        at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:57)
        at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:142)
        at org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:268)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

2017-05-06 18:58:53.599 ERROR (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr)
[   ] o.a.s.c.c.DefaultConnectionStrategy Reconnect to ZooKeeper failed:org.apache.solr.common.cloud.ZooKeeperException:

        at org.apache.solr.cloud.ZkController$1.command(ZkController.java:392)
        at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:168)
        at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:57)
        at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:142)
        at org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:268)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists
for /live_nodes/idx6.example.org:8983_solr
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
        at org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:526)
        at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
        at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:523)
        at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:466)
        at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:453)
        at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:430)
        at org.apache.solr.cloud.ZkController.createEphemeralLiveNode(ZkController.java:823)
        at org.apache.solr.cloud.ZkController.access$600(ZkController.java:120)
        at org.apache.solr.cloud.ZkController$1.command(ZkController.java:340)
        ... 10 more
2017-05-06 18:58:53.600 WARN  (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr)
[   ] o.a.s.c.c.DefaultConnectionStrategy Reconnect to ZooKeeper failed
2017-05-06 18:58:57.052 ERROR (qtp1873653341-14807) [   ] o.a.s.h.RequestHandlerBase org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /collections/search/state.json
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
        at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356)
        at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353)
        at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
        at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353)
        at org.apache.solr.common.cloud.ZkStateReader.fetchCollectionState(ZkStateReader.java:1110)
        at org.apache.solr.common.cloud.ZkStateReader.forceUpdateCollection(ZkStateReader.java:321)
        at org.apache.solr.handler.admin.PrepRecoveryOp.execute(PrepRecoveryOp.java:102)
        at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:370)
        at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:388)
        at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:174)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
        at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:748)
        at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:729)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:510)

After that we occasionally see:

2017-05-06 18:58:59.079 ERROR (qtp1873653341-14989) [   ] o.a.s.s.HttpSolrCall null:org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /collections/search/state.json

We executed a hard Solr restart to get stuff back up. Is this a known issue?

Thanks,
Markus

Mime
View raw message