lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From danny teichthal <>
Subject Re: SolrCloud - Strategy for recovering cluster states
Date Tue, 01 Mar 2016 20:09:29 GMT
Just summarizing my questions if the long mail is a little intimidating:
1. Is there a best practice/automated tool for overcoming problems in
cluster state coming from zookeeper disconnections?
2. Creating a collection via core admin is discouraged, is it true also for discovery?

I would like to be able to specify collection.configName in the and when starting server, the collection will be created
and linked to the config name specified.

On Mon, Feb 29, 2016 at 4:01 PM, danny teichthal <>

> Hi,
> I would like to describe a process we use for overcoming problems in
> cluster state when we have networking issues. Would appreciate if anyone
> can answer about what are the flaws on this solution and what is the best
> practice for recovery in case of network problems involving zookeeper.
> I'm working with Solr Cloud with version 5.2.1
> ~100 collections in a cluster of 6 machines.
> This is the short procedure:
> 1. Bring all the cluster down.
> 2. Clear all data from zookeeper.
> 3. Upload configuration.
> 4. Restart the cluster.
> We rely on the fact that a collection is created on core discovery
> process, if it does not exist. It gives us much flexibility.
> When the cluster comes up, it reads from and creates the
> collections if needed.
> Since we have only one configuration, the collections are automatically
> linked to it and the cores inherit it from the collection.
> This is a very robust procedure, that helped us overcome many problems
> until we stabilized our cluster which is now pretty stable.
> I know that the leader might change in such case and may lose updates, but
> it is ok.
> The problem is that today I want to add a new config set.
> When I add it and clear zookeeper, the cores cannot be created because
> there are 2 configurations. This breaks my recovery procedure.
> I thought about a few options:
> 1. Put the config Name in - this doesn't work. (It is
> supported in CoreAdminHandler, but  is discouraged according to
> documentation)
> 2. Change recovery procedure to not delete all data from zookeeper, but
> only relevant parts.
> 3. Change recovery procedure to delete all, but recreate and link
> configurations for all collections before startup.
> Option #1 is my favorite, because it is very simple, it is currently not
> supported, but from looking on code it looked like it is not complex to
> implement.
> My questions are:
> 1. Is there something wrong in the recovery procedure that I described ?
> 2. What is the best way to fix problems in cluster state, except from
> editing clusterstate.json manually? Is there an automated tool for that? We
> have about 100 collections in a cluster, so editing is not really a
> solution.
> 3.Is creating a collection via is also discouraged?
> Would very appreciate any answers/ thoughts on that.
> Thanks,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message