ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Beerbower" <tbeerbo...@hortonworks.com>
Subject Re: Review Request 27582: Alerts: NameNode Health HA Alert Check
Date Tue, 04 Nov 2014 21:13:30 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27582/#review59837
-----------------------------------------------------------

Ship it!


Ship It!

- Tom Beerbower


On Nov. 4, 2014, 7:19 p.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27582/
> -----------------------------------------------------------
> 
> (Updated Nov. 4, 2014, 7:19 p.m.)
> 
> 
> Review request for Ambari, Nate Cole and Tom Beerbower.
> 
> 
> Bugs: AMBARI-8143
>     https://issues.apache.org/jira/browse/AMBARI-8143
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> The NameNode HA Health Check is unique in that is requires knowledge of both states of
the active and passive NN in order to make the correct alert state calculation. It also doesn't
need to run on every host that's a NAMENODE component. 
> 
> This presents a problem for alerts as there is no way to say, "Run this alert, but only
on one host, not both". It's also a problem because if the host you want to run it on goes
down, the alert won't run. And finally, it's a problem because if you run the alert on host1
and then that host fails and host2 takes over, the alert appears to be from another host and
does not replace the original alert.
> 
> To solve this, the following changes were made:
> 
> - SCRIPT alerts can now return a status of SKIPPED, meaning that they ran successfully
but don't need to report back their status to the Ambari server; nothing from them will get
put into the agent's alert collector.
> 
> - Alert definitions have a new property to ignore the hosts that the alert instances
are originating from. This allows any host to run the alert and report back to Ambari, but
the server will collapse these into a single current alert; there won't be multiple history
items either
> 
> 
> Diffs
> -----
> 
>   ambari-agent/src/main/python/ambari_agent/alerts/base_alert.py d93ec48 
>   ambari-agent/src/main/python/ambari_agent/alerts/script_alert.py 12d0d2a 
>   ambari-agent/src/test/python/ambari_agent/TestAlerts.py 1f8d0c0 
>   ambari-agent/src/test/python/ambari_agent/dummy_files/test_script.py 278c26c 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/AlertDefinitionResourceProvider.java
86e7b7e 
>   ambari-server/src/main/java/org/apache/ambari/server/events/listeners/AlertReceivedListener.java
bcbe823 
>   ambari-server/src/main/java/org/apache/ambari/server/orm/entities/AlertDefinitionEntity.java
6374342 
>   ambari-server/src/main/java/org/apache/ambari/server/state/alert/AlertDefinition.java
961fb66 
>   ambari-server/src/main/java/org/apache/ambari/server/state/alert/AlertDefinitionFactory.java
cd937ef 
>   ambari-server/src/main/java/org/apache/ambari/server/upgrade/SchemaUpgradeHelper.java
e1d5dca 
>   ambari-server/src/main/java/org/apache/ambari/server/upgrade/UpgradeCatalog200.java
PRE-CREATION 
>   ambari-server/src/main/resources/Ambari-DDL-MySQL-CREATE.sql 1b16c2f 
>   ambari-server/src/main/resources/Ambari-DDL-Oracle-CREATE.sql ef7b564 
>   ambari-server/src/main/resources/Ambari-DDL-Postgres-CREATE.sql 18fe6d4 
>   ambari-server/src/main/resources/Ambari-DDL-Postgres-EMBEDDED-CREATE.sql fa131fd 
>   ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/alerts.json a409230

>   ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/package/files/alert_ha_namenode_health.py
PRE-CREATION 
>   ambari-server/src/test/java/org/apache/ambari/server/api/services/AmbariMetaInfoTest.java
2b1853a 
>   ambari-server/src/test/java/org/apache/ambari/server/controller/internal/AlertDefinitionResourceProviderTest.java
7823994 
>   ambari-server/src/test/java/org/apache/ambari/server/upgrade/UpgradeCatalog200Test.java
PRE-CREATION 
>   ambari-server/src/test/resources/stacks/HDP/2.0.5/services/HDFS/alerts.json 92e7b8f

> 
> Diff: https://reviews.apache.org/r/27582/diff/
> 
> 
> Testing
> -------
> 
> Tested on clusters with both HA disabled and enabled. When enabled, verified that failing
different instances of the NameNodes had the correct affect on the alert:
> 
>         "state" : "CRITICAL",
>         "text" : "Active[], Standby['c6402.ambari.apache.org:50070'], Unknown['c6401.ambari.apache.org:50070']"
>         
>         "state" : "CRITICAL",
>         "text" : "Active['c6402.ambari.apache.org:50070'], Standby[], Unknown['c6401.ambari.apache.org:50070']"
>         
>         "state" : "OK",
>         "text" : "Active['c6402.ambari.apache.org:50070'], Standby['c6401.ambari.apache.org:50070'],
Unknown[]"
>         
> New tests added as well...
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message