lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: 6.5.1. cloud went partially down
Date Wed, 10 May 2017 10:44:22 GMT
I am not this is directly related but we also sometimes see clients losing connections on 6.5.1,
this with the problem described below are unique to 6.5.1, i have not seen this many issues
with cloud in a short time for a very long time. 

2017-05-09 21:30:36.661 ERROR (Document compiler) [c:logs s:shard1 r:core_node1 x:logs_shard1_replica1]
o.a.s.c.s.i.CloudSolrClient Request to collection search failed due to (0) java.lang.IllegalStateException:
Connection pool shut down, retry? 0

Clients appear unable to recover from this problem. The cloud the clients are connecting to
is up and doing fine.

Any ideas?

Thanks,
Markus

 
 
-----Original message-----
> From:Markus Jelsma <markus.jelsma@openindex.io>
> Sent: Monday 8th May 2017 11:35
> To: solr-user <solr-user@lucene.apache.org>
> Subject: 6.5.1. cloud went partially down
> 
> Hi,
> 
> Multiple 6.5.1. clouds / collections went down this weekend around the same time, they
share the same ZK quorum. The nodes stayed up but did not rejoin the cluster (find or connect
to ZK)
> 
> This is what the log told us:
> 
> 2017-05-06 18:58:34.893 WARN  (zkCallback-5-thread-9-processing-n:idx6.example.org:8983_solr)
[   ] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@4f97bdad
name: ZooKe
> eperConnection Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
got event WatchedEvent state:Disconnected type:None path:null path: null type: None
> 2017-05-06 18:58:34.893 WARN  (zkCallback-5-thread-9-processing-n:idx6.example.org:8983_solr)
[   ] o.a.s.c.c.ConnectionManager zkClient has disconnected
> 2017-05-06 18:58:35.001 WARN  (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search s:shard2 r:core_node6
x:search_shard2_replica3] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@c226cc
name: ZooKeeperConnection Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
got event WatchedEvent state:Disconnected type:None path:null path: null type: None
> 2017-05-06 18:58:35.010 WARN  (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search s:shard2 r:core_node6
x:search_shard2_replica3] o.a.s.c.c.ConnectionManager zkClient has disconnected
> 2017-05-06 18:58:45.360 WARN  (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr)
[   ] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@4f97bdad
name: ZooKeeperConnection Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
got event WatchedEvent state:Expired type:None path:null path: null type: None
> 2017-05-06 18:58:45.360 WARN  (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr)
[   ] o.a.s.c.c.ConnectionManager Our previous ZooKeeper session was expired. Attempting to
reconnect to recover relationship with ZooKeeper...
> 2017-05-06 18:58:45.380 WARN  (OverseerStateUpdate-97740792370385619-idx6.example.org:8983_solr-n_0000000558)
[   ] o.a.s.c.Overseer Solr cannot talk to ZK, exiting Overseer main queue loop
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session
expired for /overseer/queue
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
>         at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:339)
>         at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:336)
>         at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
>         at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:336)
>         at org.apache.solr.cloud.DistributedQueue.fetchZkChildren(DistributedQueue.java:308)
>         at org.apache.solr.cloud.DistributedQueue.firstChild(DistributedQueue.java:285)
>         at org.apache.solr.cloud.DistributedQueue.firstElement(DistributedQueue.java:393)
>         at org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:159)
>         at org.apache.solr.cloud.DistributedQueue.peek(DistributedQueue.java:137)
>         at org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:180)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-05-06 18:58:45.381 WARN  (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr)
[   ] o.a.s.c.c.DefaultConnectionStrategy Connection expired - starting a new one...
> 2017-05-06 18:58:45.382 ERROR (OverseerExitThread) [   ] o.a.s.c.Overseer could not read
the data
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session
expired for /overseer_elect/leader
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>         at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356)
>         at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353)
>         at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
>         at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353)
>         at org.apache.solr.cloud.Overseer$ClusterStateUpdater.checkIfIamStillLeader(Overseer.java:287)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-05-06 18:58:46.453 WARN  (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search s:shard2 r:core_node6
x:search_shard2_replica3] o.a.s.c.c.ConnectionManager Watcher org.apache.solr.common.cloud.ConnectionManager@c226cc
name: ZooKeeperConnection Watcher:89.188.14.10:2181,89.188.14.11:2181,89.188.14.12:2181/solr_collection_search
got event WatchedEvent state:Expired type:None path:null path: null type: None
> 2017-05-06 18:58:46.453 WARN  (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search s:shard2 r:core_node6
x:search_shard2_replica3] o.a.s.c.c.ConnectionManager Our previous ZooKeeper session was expired.
Attempting to reconnect to recover relationship with ZooKeeper...
> 2017-05-06 18:58:46.460 WARN  (zkCallback-9-thread-5-processing-n:idx6.example.org:8983_solr
x:search_shard2_replica3 s:shard2 c:search r:core_node6-EventThread) [c:search s:shard2 r:core_node6
x:search_shard2_replica3] o.a.s.c.c.DefaultConnectionStrategy Connection expired - starting
a new one...
> 2017-05-06 18:58:53.599 ERROR (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr)
[   ] o.a.s.c.ZkController :org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode
= NodeExists for /live_nodes/idx6.example.org:8983_solr
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>         at org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:526)
>         at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
>         at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:523)
>         at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:466)
>         at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:453)
>         at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:430)
>         at org.apache.solr.cloud.ZkController.createEphemeralLiveNode(ZkController.java:823)
>         at org.apache.solr.cloud.ZkController.access$600(ZkController.java:120)
>         at org.apache.solr.cloud.ZkController$1.command(ZkController.java:340)
>         at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:168)
>         at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:57)
>         at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:142)
>         at org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:268)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> 
> 2017-05-06 18:58:53.599 ERROR (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr)
[   ] o.a.s.c.c.DefaultConnectionStrategy Reconnect to ZooKeeper failed:org.apache.solr.common.cloud.ZooKeeperException:

>         at org.apache.solr.cloud.ZkController$1.command(ZkController.java:392)
>         at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:168)
>         at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:57)
>         at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:142)
>         at org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:268)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode
= NodeExists for /live_nodes/idx6.example.org:8983_solr
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>         at org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:526)
>         at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
>         at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:523)
>         at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:466)
>         at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:453)
>         at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:430)
>         at org.apache.solr.cloud.ZkController.createEphemeralLiveNode(ZkController.java:823)
>         at org.apache.solr.cloud.ZkController.access$600(ZkController.java:120)
>         at org.apache.solr.cloud.ZkController$1.command(ZkController.java:340)
>         ... 10 more
> 2017-05-06 18:58:53.600 WARN  (zkCallback-5-thread-8-processing-n:idx6.example.org:8983_solr)
[   ] o.a.s.c.c.DefaultConnectionStrategy Reconnect to ZooKeeper failed
> 2017-05-06 18:58:57.052 ERROR (qtp1873653341-14807) [   ] o.a.s.h.RequestHandlerBase
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
for /collections/search/state.json
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>         at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356)
>         at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353)
>         at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
>         at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353)
>         at org.apache.solr.common.cloud.ZkStateReader.fetchCollectionState(ZkStateReader.java:1110)
>         at org.apache.solr.common.cloud.ZkStateReader.forceUpdateCollection(ZkStateReader.java:321)
>         at org.apache.solr.handler.admin.PrepRecoveryOp.execute(PrepRecoveryOp.java:102)
>         at org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:370)
>         at org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:388)
>         at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:174)
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
>         at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:748)
>         at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:729)
>         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:510)
> 
> After that we occasionally see:
> 
> 2017-05-06 18:58:59.079 ERROR (qtp1873653341-14989) [   ] o.a.s.s.HttpSolrCall null:org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /collections/search/state.json
> 
> We executed a hard Solr restart to get stuff back up. Is this a known issue?
> 
> Thanks,
> Markus
> 

Mime
View raw message