lucene-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (Jira) <j...@apache.org>
Subject [jira] [Commented] (SOLR-13899) zkstatus page incorrectly reports zookeeper in error when Zookeeper observers are present
Date Wed, 06 Nov 2019 20:21:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-13899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968669#comment-16968669
] 

Jan Høydahl commented on SOLR-13899:
------------------------------------

Appreciate your Jira report (as the one who build the original zk status page). It's not a
big hurry to fix this as the UI still works and it is just wrong to call it an error. So I'd
appreciate if you would attempt a fix and come back with a Pull Request when you have time.
Of your two options I prefer option 1, perhaps as a first step. Then of course it would be
nice to handle observers specifically as you suggest.

> zkstatus page incorrectly reports zookeeper in error when Zookeeper observers are present
> -----------------------------------------------------------------------------------------
>
>                 Key: SOLR-13899
>                 URL: https://issues.apache.org/jira/browse/SOLR-13899
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 8.3.0
>            Reporter: Salvatore
>            Priority: Trivial
>              Labels: easyfix
>         Attachments: zkstatus.png
>
>
> When a zookeeper ensemble has 'observers', the zkstatus page incorrectly says Zookeeper
status is in error (See attachment.)
> This is because the [ZookeeperStatusHandler|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/admin/ZookeeperStatusHandler.java] does
not account for the '[observer|https://zookeeper.apache.org/doc/current/zookeeperObservers.html]'
role whatsoever.
> This should be an easy fix - I see there being two options;
> 1. Treat observers as followers by changing [L112|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/admin/ZookeeperStatusHandler.java#L112]
to
> {code:java}
> if ("follower".equals(state) || "observer".equals(state)) {
> {code}
>  
>  2. Ignore observers from the required follower count by changing [L116|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/admin/ZookeeperStatusHandler.java#L116]
to
> {code:java}
>           reportedFollowers = Integer.parseInt(String.valueOf(stat.get("zk_synced_followers")));
> {code}
> Option 1 will make the zkstatus page show error when an observer is down.
>  Option 2 will not make the zkstatus page show error when an observer is down.
> *Ideally*, additional logic to account for observers should be added, and show a STATUS_YELLOW
when any observers are down (but followers are all up), as this means the ensemble is only
in a degraded, but functional state.
> Happy to create a PR, however I don't have a lot of free time at home at the moment,
so it may take a week or two.
>  
> Additional info:
> See below for example mntr output for the Leader/Follower/Observer roles, noting the
Leader's zk_followers and zk_synced_followers values, and the values of zk_server_state.
> Leader:
> {code:java}
> [root@master1 ~]# echo mntr | nc master3 12181
> zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on 10/08/2019 20:18
GMT
> zk_avg_latency 0
> zk_max_latency 2
> zk_min_latency 0
> zk_packets_received 97
> zk_packets_sent 96
> zk_num_alive_connections 2
> zk_outstanding_requests 0
> zk_server_state leader
> zk_znode_count 92
> zk_watch_count 7
> zk_ephemerals_count 9
> zk_approximate_data_size 236333
> zk_open_file_descriptor_count 64
> zk_max_file_descriptor_count 4096
> zk_followers 4
> zk_synced_followers 2
> zk_pending_syncs 0
> zk_last_proposal_size -1
> zk_max_proposal_size -1
> zk_min_proposal_size -1
> {code}
> Follower:
> {code:java}
> [root@master1 ~]# echo mntr | nc master2 12181
> zk_version	3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on 10/08/2019 20:18
GMT
> zk_avg_latency	0
> zk_max_latency	6
> zk_min_latency	0
> zk_packets_received	97
> zk_packets_sent	96
> zk_num_alive_connections	2
> zk_outstanding_requests	0
> zk_server_state	follower
> zk_znode_count	92
> zk_watch_count	7
> zk_ephemerals_count	9
> zk_approximate_data_size	236333
> zk_open_file_descriptor_count	60
> zk_max_file_descriptor_count	4096
> {code}
> Observer:
> {code:java}
> [root@master1 ~]# echo mntr | nc slave1 12181
> zk_version	3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on 10/08/2019 20:18
GMT
> zk_avg_latency	0
> zk_max_latency	8
> zk_min_latency	0
> zk_packets_received	174
> zk_packets_sent	173
> zk_num_alive_connections	2
> zk_outstanding_requests	0
> zk_server_state	observer
> zk_znode_count	92
> zk_watch_count	7
> zk_ephemerals_count	9
> zk_approximate_data_size	236333
> zk_open_file_descriptor_count	59
> zk_max_file_descriptor_count	4096
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Mime
View raw message