lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: Two hour recovery in non-replica setup?
Date Wed, 14 Aug 2013 14:13:04 GMT
There have been hundreds of fixes since 4.0 - many around zookeeper integration. Really hard
to say what it might be - I can say nothing should take that long by design.

A pretty important somewhat recent bug found was
When reconnecting after ZooKeeper expiration, we need to be willing to wait forever, not for
30 seconds.

Don't know that it has anything to do with what you are seeing, or if perhaps that didn't
exist in 4.0 - but worth looking at.

- Mark

On Aug 14, 2013, at 9:10 AM, Per Steffensen <> wrote:

> Hi
> We have a fairly large SolrCloud installation where we continuously index a lot of documents.
Users are doing searches against the system from time to time.
> From time to time our Solrs lose their Zookeeper connection. Guess thats what happens.
But it takes two hours before a Solr that loses its ZK connection has its shards active again.
I cant imagine what it needs to do for two hours? We are not using replica, so synch of data
among replica is not it. At the time we havnt dived into the problem, and we do not know there
the time is spent, whether it is between the connection is lost and the Solr "realizes" it,
or whether it is from when it "realizes" it and until the shards are declared active again.
> We will dive into the problem, but before doing that I wanted to ask here if this is
a known problem, and, if yes, whether or not it is solved? FYI we are using Solr 4.0.0.
> Regards, Per Steffensen
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message