lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Per Steffensen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3785) Cluster-state inconsistent
Date Fri, 09 Nov 2012 10:36:12 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493888#comment-13493888
] 

Per Steffensen commented on SOLR-3785:
--------------------------------------

Well, I believe the entire thing with the Overseer is a bad idea. It requires at least one
Solr is running before you can trust the state-descriptions in ZK - even if this particular
"issue" SOLR-3785 is solved using Overseer. We have clients that uses the state-descriptions
(through CloudSolrServer/ZkStateReader) to detect if the Solr cluster is running well enough
to use it. If all Solrs are down I believe it cannot be seen from the state (you can check
live-nodes, and if no Solrs are running you know that you cant trust it).

I think you should remove the Overseer entirely and modify ZkStateReader to be able to, single-handedly,
look at the ZK state and calculate correct ClusterState. E.g. shard-state could be maintained
by the Solr running the shard (as it is today), but as an ephemeral node that disappears when
the Solr is not running. ZkStateReader should have logic that, when calculating a shard-state,
looks at this ephemeral node, but if it is missing assumes "down"-state.

Regards, Per Steffensen
                
> Cluster-state inconsistent
> --------------------------
>
>                 Key: SOLR-3785
>                 URL: https://issues.apache.org/jira/browse/SOLR-3785
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.0
>         Environment: Self-build Solr release built on Apache Solr revision 1355667 from
4.x branch
>            Reporter: Per Steffensen
>         Attachments: SOLR-3785.patch
>
>
> Information in CloudSolrServer.getZkStateReader().getCloudState() (called cloudState
below) seems to be inconsistent. 
> I have a Solr running the leader of slice "sliceName" in collection "collectionName"
- no replica to take over. I shut down this Solr, and I want to detect that there is now no
leader active. 
> I do e.g.
> {code}
> ZkNodeProps leader = cloudState.getLeader(indexName, sliceName);
> boolean notActive = (leader == null) || !leader.containsKey(ZkStateReader.STATE_PROP)
|| !leader.get(ZkStateReader.STATE_PROP).equals(ZkStateReader.ACTIVE);
> {code}
> This does not work. It seems like changing state of a shard does it not changed when
this Solr goes down.
> I do e.g.
> {code}
> ZkNodeProps leader = cloudState.getLeader(indexName, sliceName);
> boolean notActive = (leader == null) || !leader.containsKey(ZkStateReader.STATE_PROP)
|| !leader.get(ZkStateReader.STATE_PROP).equals(ZkStateReader.ACTIVE) ||
> !leader.containsKey(ZkStateReader.NODE_NAME_PROP) || !cloudState.getLiveNodes().contains(leader.get(ZkStateReader.NODE_NAME_PROP))
> {code}
> Whis works.
> It seems like live-nodes of cloudState is updated when Solr goes down, but that some
of the other info available through cloudState is not - e.g. getLeader().
> This might already have already been solved on 4.x branch in a revision later than 1355667.
Then please just tell me - thanks.
> Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message