lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <>
Subject [jira] [Created] (SOLR-9361) Concept of replica state being "down" is confusing and missleading (especially w/DELETEREPLICA)
Date Fri, 29 Jul 2016 23:09:20 GMT
Hoss Man created SOLR-9361:

             Summary: Concept of replica state being "down" is confusing and missleading (especially
                 Key: SOLR-9361
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Hoss Man

In this thread on solr-user, Jerome Yang pointed out some really confusing behavior regarding
a "down" node and DELETEREPLICA's behavior when a node is not shutdown cleanly...

I'll post a comment in a momment with a detailed walk through of how confusing the "state"
of a node/replica can be when a machine crashes, but the SUmmary highlights are...

* Admin UI & CLUSTERSTATUS API use diff terminology to describe replicas hoted on machines
that can't be reached
** CLUSTERSTATUS API lists the status as "down"
** the Admin UI displays them as "Gone" (even though it also has an option for "Down" which
never seems to be used)
* Neither Admin UI & CLUSTERSTATUS API distinguish replicas that on nodes that were shutdown
cleanly vs replicas on nodes that just vanished from the cluster (ie: catastrophic failure
/ network partitioning)
* DELETEREPLICA w/ {{onlyIfDown=true}} only works if a replica was shutdown cleanly
** For a replica that was on a node that had catastrophic failure, Using {{onlyIfDown=true}}
causes an error that the replica {{state is 'active'}}
*** This in spite of the fact that CLUSTERSTATUS API explicitly says {{"state":"down"}} for
that replica
* DELETEREPLICA on any replica that was hosted on a node that is no longer up (either because
it was cleanly shutdown using & using {{onlyIfDown=true}} or down for any reason and using
{{onlyIfDown=false}} generates a failure that "{{Server refused connection}}"
** This in spite of the fact that the DELETEREPLICA otherwise appears to have succeded

...there are probably multiple underlying bugs here that are exponentially worse in the context
of eachother.  We should spin off new issues as needed to track them once they are concretely
identified, but i wanted to open this "ubser issue" to capture the overall experience.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message