ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Onischuk" <aonis...@hortonworks.com>
Subject Re: Review Request 21113: Add Nagios alert if HDFS last checkpoint time exceeds threshold
Date Tue, 13 May 2014 14:06:42 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21113/
-----------------------------------------------------------

(Updated May 13, 2014, 2:06 p.m.)


Review request for Ambari and Myroslav Papirkovskyy.


Bugs: AMBARI-5681
    https://issues.apache.org/jira/browse/AMBARI-5681


Repository: ambari


Description
-------

Description: If the secondary NameNode(SNN) failed to merge edit files for any
reason, Nagios doesn't alert on it.

PROBLEM: For any reasons, SNN fails to merge edit files for long time it goes
undetected. This can cause the edit files to become very large and slows down
NameNode performance. And in some cases, can lead to corruption of NameNode
edit files.  
BUSINESS IMPACT: If Nagios doesn't alert on SNN functionality, this will
eventually cause long downtime for all of customers and a possiblitly of data
loss.

STEPS TO REPRODUCE:

  * SNN fails to merge edit files for any reason
  * NameNode edit files grow in size
  * Corruption to edit files.

ACTUAL BEHAVIOR: Nagios doesn't fire critical alarm  
EXPECTED BEHAVIOR: Nagios should fire critical alarm

SUPPORT ANALYSIS: N/A

Note:

We need to get this fixed and alert our customers to add the nagios alarm
ASAP.


Diffs (updated)
-----

  ambari-server/src/main/resources/stacks/HDP/2.0.6/services/NAGIOS/package/files/check_checkpoint_time.py
PRE-CREATION 
  ambari-server/src/main/resources/stacks/HDP/2.0.6/services/NAGIOS/package/scripts/nagios_server_config.py
4089b2e 
  ambari-server/src/main/resources/stacks/HDP/2.0.6/services/NAGIOS/package/scripts/params.py
2e41c23 
  ambari-server/src/main/resources/stacks/HDP/2.0.6/services/NAGIOS/package/templates/hadoop-commands.cfg.j2
ff03bf9 
  ambari-server/src/main/resources/stacks/HDP/2.0.6/services/NAGIOS/package/templates/hadoop-services.cfg.j2
e7fda1a 
  ambari-server/src/test/python/stacks/2.0.6/NAGIOS/test_nagios_server.py 145b443 

Diff: https://reviews.apache.org/r/21113/diff/


Testing
-------


Thanks,

Andrew Onischuk


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message