ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Fernandez" <afernan...@hortonworks.com>
Subject Review Request 38651: AMBARI-13194. Alert definition when DataNode data dirs are likely to become unmounted
Date Tue, 22 Sep 2015 22:17:50 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38651/
-----------------------------------------------------------

Review request for Ambari, Andrew Onischuk, Dmitro Lisnichenko, Jonathan Hurley, Nate Cole,
Sumit Mohanty, Srimanth Gunturi, and Sid Wagle.


Bugs: AMBARI-13194
    https://issues.apache.org/jira/browse/AMBARI-13194


Repository: ambari


Description
-------

Ambari uses the dfs.datanode.data.dir.mount.file property in HDFS, whose value is typically
/etc/hadoop/conf/dfs_data_dir_mount.hist
to track the mount points for each of the data dirs.

E.g.,
{code}
/hadoop01/data,/device1
/hadoop02/data,/device2
/hadoop03/data,/     # this one is on root, the others are all on mount points.
{code}

Whenever a drive becomes unmounted, Ambari detects that it was previously on a mount and will
not create that data dir; HDFS can still tolerate the failure if dfs.datanode.failed.volumes.tolerated
is greater than 0.
Now, if the /etc/hadoop/conf/dfs_data_dir_mount.hist file is deleted, then Ambari won't have
this knowledge, and will create the datadir (even if it's on the root partition).

To improve tracking, create an alert definition that checks the following
* warning status if the /etc/hadoop/conf/dfs_data_dir_mount.hist file is deleted
* critical status if at least one of the data dirs is mounted on the root partition, and at
least one data dir is on a mount


Diffs
-----

  ambari-common/src/main/python/resource_management/core/providers/system.py 213adc5 
  ambari-common/src/main/python/resource_management/libraries/functions/dfs_datanode_helper.py
a05e162 
  ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json 477fd95 
  ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/alerts/alert_datanode_unmounted_data_dir.py
PRE-CREATION 
  ambari-server/src/test/python/stacks/2.0.6/HDFS/test_alert_datanode_unmounted_data_dir.py
PRE-CREATION 

Diff: https://reviews.apache.org/r/38651/diff/


Testing
-------

* Python unit tests passed
* Verified that the alert worked on several hosts for all 3 types of statuses (WARNING, CRITICAL,
OK)
* Also checked that it did not run on a host without DataNode, and it did run once I added
DataNode to that host


Thanks,

Alejandro Fernandez


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message