lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: CloudSolrClient$RouteException: Cannot talk to ZooKeeper - Updates are disabled.
Date Sat, 19 Nov 2016 01:11:05 GMT
The clusterstate on Zookeeper shouldn't be changing
very often, only when nodes come and go.

bq: At that time I am also running queries (that return
millions of docs).

As in rows=milions? This is an anti-pattern, if that's true
then you're probably network saturated and the like. If
you mean your numFound is millions, then this is unlikely
to be a problem.

you say "clusterstate.json", which indicates you're on
4x? This has been changed to make a state.json for
each collection, so either you upgraded sometime and
didn't transform you ZK (there's a command to do that)
or can you upgrade?

What I'm guessing is that you have too much going on
somehow and you're overloading your system and
getting a timeout. So increasing the timeout
is definitely a possibility, or reducing the ingestion load
as a test.


On Fri, Nov 18, 2016 at 4:51 PM, Chetas Joshi <> wrote:
> Hi,
> I have a SolrCloud (on HDFS) of 50 nodes and a ZK quorum of 5 nodes. The
> SolrCloud is having difficulties talking to ZK when I am ingesting data
> into the collections. At that time I am also running queries (that return
> millions of docs). The ingest job is crying with the the following exception
> org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error
> from server at http://xxx/solr/collection1_shard15_replica1: Cannot talk to
> ZooKeeper - Updates are disabled.
> I think this is happening when the ingest job is trying to update the
> clusterstate.json file but the query is reading from that file and thus has
> some kind of a lock on that file. Are there any factors that will cause the
> "READ" to acquire lock for a long time? Is my understanding correct? I am
> using the cursor approach using SolrJ to get back results from Solr.
> How often is the ZK updated with the latest cluster state and what
> parameter governs that? Should I just increase the ZK client timeout so
> that it retries connecting to the ZK for a longer period of time (right now
> it is 15 seconds)?
> Thanks!

View raw message