lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christine Poerschke (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-5593) shard leader loss due to ZK session expiry
Date Tue, 31 Dec 2013 18:16:50 GMT

    [ https://issues.apache.org/jira/browse/SOLR-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859600#comment-13859600
] 

Christine Poerschke commented on SOLR-5593:
-------------------------------------------

bq. One thing that seems kind of silly is that those replicas reject the updates at all. It
seems like perhaps we should relax things a bit so that they would be accepted.

Yes, we are working on changes to DistributedUpdateProcessor to relax the requirement for
the getLeaderRetry to succeed within setupRequest (if phase is DistribPhase.FROMLEADER and
the shard state shows it could not be subShardLeader then getLeaderRetry success should be
optional).

> shard leader loss due to ZK session expiry
> ------------------------------------------
>
>                 Key: SOLR-5593
>                 URL: https://issues.apache.org/jira/browse/SOLR-5593
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Christine Poerschke
>            Assignee: Mark Miller
>             Fix For: 5.0, 4.7, 4.6.1
>
>         Attachments: CoreAdminHandler.patch
>
>
> The problem we saw was that the shard leader ceased to be shard leader (in our case due
to its zookeeper session expiring). The followers thus rejected update requests (DistributedUpdateProcessor
setupRequest's call to ZkStateReader getLeaderRetry) and the leader asked them to recover
(DistributedUpdateProcessor doFinish). The followers published themselves as recovering (CoreAdminHandler
handleRequestRecoveryAction) and the shard leader loss triggered an election in which none
of the followers became the leader due to their recovering state (ShardLeaderElectionContext
shouldIBeLeader). The former shard leader also did not become shard leader because its new
seq number placed it after the existing replicas (LeaderElector checkIfIamLeader seq <=
intSeqs.get(0)).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message