lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben DeMott (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure
Date Wed, 21 Feb 2018 23:16:01 GMT

     [ https://issues.apache.org/jira/browse/SOLR-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ben DeMott updated SOLR-10284:
------------------------------
    Affects Version/s: 7.0
                       7.1
                       7.2

> Solr connection to Standalone node in Ensemble causes cluster failure
> ---------------------------------------------------------------------
>
>                 Key: SOLR-10284
>                 URL: https://issues.apache.org/jira/browse/SOLR-10284
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 6.3, 6.4, 7.0, 7.1, 7.2
>         Environment: Solrcloud, with Zookeeper <any version>
>            Reporter: Ben DeMott
>            Priority: Major
>
> I posted this issue on the Dev mailing list and was encouraged to create a Jira ticket.
 This isn't a bug per-se.
> Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble cluster,
which causes absolute havoc. 
> I work for Dice.com, as one of the core search developers.
> I'm happy to write a patch, as we'll probably do that internally anyways.  I just want
to get consensus from the community about how to provide the best solution.
> My original email describing the issue: 
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2
> Proposed Solution:
> My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would default
to TRUE for the solr.in.sh file found next to bin/solr).  Upon connection or reconnection
of the Zookeeper Client, it would ask the server "are you standalone", and disconnect if it
is and ZK_STANDALONE=false, and try the next host.  If all hosts are in standalone, an error
would be shown - "No zookeeper hosts available, that aren't in standalone operation - The
setting ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"
> In order to urge users to use the setting, I would possibly also have a warning shown
in the logs, if your ZK_HOSTS is set, has multiple hosts in the connection string, and ZK_STANDALONE
is not false.
> I can't think of any implicit way to internalize a setting.... Other than....  ZK_HOSTS
connection string setting has multiple hosts, there should be no scenario in which any node
is standalone, so you could assume there should be no standalone servers.  But maybe an explicit
setting is preferable.
> This solution should be:
> 1.) backwards compatible
> 2.) have very little performance impact (1 extra call upon connection to ZK)
> 3.) isolated to one part of the code.
> *Update 6/26/2017:*
> I started working on this, and it occurred to me the same issue exists for *SolrJ* clients.
 So SolrJ might be the place to make this change. I'm not sure yet.
> A SolrJ client that has a multi-zk-node connection string that connects (even temporarily)
to a zk host that is standalone will believe there are no Solr hosts that can answer the query,
and you'll get the following error.  
> {{CloudSolrClient - Request to collection efc-profiles-match-col failed due to (510)
org.apache.solr.common.SolrException: Could not find a healthy node to handle the request.}}
> I am not as familiar with the SolrJ codebase ... so I'll have to do some digging.
> Instead of moving onto a different Zookeeper host, the SolrJ client will think everything
is fully working, just no Solr Hosts or Collections
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message