ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hurley <jhur...@hortonworks.com>
Subject Re: Ambari's "HBase Regionserver Process" alert thresholds
Date Fri, 24 Mar 2017 20:27:16 GMT
You're right that the AGGREGATE alert doesn't give you the host name of the affected host.
You can query the alerts endpoint directly to discover the name of the host:
GET api/v1/clusters/<clusterName>/alerts?Alert/state=CRITICAL&Alert/definition_name=hbase_regionserver_process

On Mar 24, 2017, at 4:05 PM, Ganesh Viswanathan <gansvv@gmail.com<mailto:gansvv@gmail.com>>
wrote:

This API call worked to get the state for all regionservers:

/api/v1/clusters/cluster_name/services/HBASE/components/HBASE_REGIONSERVER?fields=host_components/HostRoles/state

I can filter out INSTALLED from this list to find the stopped one.

Thanks!


On Fri, Mar 24, 2017 at 12:34 PM, Ganesh Viswanathan <gansvv@gmail.com<mailto:gansvv@gmail.com>>
wrote:
Thanks, that explains the behavior when I shut down the regionserver process and see the CRITICAL
alert.

What I am trying to do is setup a WARNING alert for the case when a single "HBase Regionserver
Process" is down and CRITICAL alert when two or more  regionservers are down. I am also trying
to get the hostname where the regionserver is down in the warning case.

Only the "HBase Regionserver Process" alert gives the name of the host impacted (I don't get
these from "RegionServers Health Summary" and "Percent RegionServers Available"), hence I
am trying to suitably modify this alert for my use-case. Is there a better way to get the
regionserver host impacted from Ambari API when RegionServers Health Summary fires at WARNING
level?




On Fri, Mar 24, 2017 at 12:27 PM, Jonathan Hurley <jhurley@hortonworks.com<mailto:jhurley@hortonworks.com>>
wrote:
I'm not sure what you mean when you say "turn down" the process. If you are shutting down
the process, then the port is released and the alert will not be able to make a socket connection.
You will get a CRITICAL right away. The values in the alert are a round-trip-time coupled
with a socket read time. For the warning, it will attempt to make a socket connection and
if it succeeds and releases in under 1.5 seconds, then there's no warning. Because you set
the CRITICAL value to 3600s but stopped the process, it's not going to wait 3600 since it
can detect much faster that the port is not open for a socket connection.

On Mar 24, 2017, at 2:40 PM, Ganesh Viswanathan <gansvv@gmail.com<mailto:gansvv@gmail.com>>
wrote:

I am using Ambari's "HBase Regionserver Process" alert with 1.5s as WARNING threshold and
3600s as CRITICAL threshold. However, when I test this by turning down the regionserver process,
the alert fires off as CRITICAL directly. Is this a bug?

I am using HDP2.4 with Ambari 2.2.1.0<http://2.2.1.0/>:
https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.0/bk_Ambari_Users_Guide/content/_hbase_service_alerts.html


Thanks,
Ganesh





Mime
View raw message