lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-6923) AutoAddReplicas should consult live nodes also to see if a state has changed
Date Mon, 12 Jan 2015 22:54:34 GMT

    [ https://issues.apache.org/jira/browse/SOLR-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274340#comment-14274340
] 

ASF subversion and git services commented on SOLR-6923:
-------------------------------------------------------

Commit 1651221 from [~anshumg] in branch 'dev/trunk'
[ https://svn.apache.org/r1651221 ]

SOLR-6923: AutoAddReplicas also consults live_nodes to see if a state change has happened

> AutoAddReplicas should consult live nodes also to see if a state has changed
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-6923
>                 URL: https://issues.apache.org/jira/browse/SOLR-6923
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Varun Thacker
>         Attachments: SOLR-6923.patch
>
>
> - I did the following 
> {code}
> ./solr start -e cloud -noprompt
> kill -9 <pid-of-node2> //Not the node which is running ZK
> {code}
> - /live_nodes reflects that the node is gone.
> - This is the only message which gets logged on the node1 server after killing node2
> {code}
> 45812 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9983] WARN  org.apache.zookeeper.server.NIOServerCnxn
 – caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid 0x14ac40f26660001,
likely client has closed socket
>     at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>     at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>     at java.lang.Thread.run(Thread.java:745)
> {code}
> - The graph shows the node2 as 'Gone' state
> - clusterstate.json keeps showing the replica as 'active'
> {code}
> {"collection1":{
>     "shards":{"shard1":{
>         "range":"80000000-7fffffff",
>         "state":"active",
>         "replicas":{
>           "core_node1":{
>             "state":"active",
>             "core":"collection1",
>             "node_name":"169.254.113.194:8983_solr",
>             "base_url":"http://169.254.113.194:8983/solr",
>             "leader":"true"},
>           "core_node2":{
>             "state":"active",
>             "core":"collection1",
>             "node_name":"169.254.113.194:8984_solr",
>             "base_url":"http://169.254.113.194:8984/solr"}}}},
>     "maxShardsPerNode":"1",
>     "router":{"name":"compositeId"},
>     "replicationFactor":"1",
>     "autoAddReplicas":"false",
>     "autoCreated":"true"}}
> {code}
> One immediate problem I can see is that AutoAddReplicas doesn't work since the clusterstate.json
never changes. There might be more features which are affected by this.
> On first thought I think we can handle this - The shard leader could listen to changes
on /live_nodes and if it has replicas that were on that node, mark it as 'down' in the clusterstate.json?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message