ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Fernandez (JIRA)" <>
Subject [jira] [Updated] (AMBARI-12267) Ambari to improve tracking of data dirs becoming unmounted
Date Thu, 02 Jul 2015 20:45:04 GMT


Alejandro Fernandez updated AMBARI-12267:
    Summary: Ambari to improve tracking of data dirs becoming unmounted  (was: Ambari to improve
tracking of data dirs becoming unmounted.)

> Ambari to improve tracking of data dirs becoming unmounted
> ----------------------------------------------------------
>                 Key: AMBARI-12267
>                 URL:
>             Project: Ambari
>          Issue Type: Story
>          Components: ambari-agent
>    Affects Versions: 2.0.0
>            Reporter: Alejandro Fernandez
>            Assignee: Alejandro Fernandez
>             Fix For: 2.2.0
> Ambari keeps track of a file, /etc/hadoop/conf/dfs_data_dir_mount.hist 
> that contains a mapping of HDFS data dirs to the last known mount point.
> This is used to detect when a data dir becomes unmounted, in order to prevent HDFS from
writing to the root partition.
> Consider the example of a data node configured with these volumes:
> /dev/sda -> / 
> /dev/sdb -> /grid/0
> /dev/sdc -> /grid/1
> /dev/sdd -> /grid/2
> Typically, each /grid/#/ directory contains a data folder.
> If hdfs-site contains dfs.datanode.failed.volumes.tolerated with a value > 0, then
DataNode will tolerate the failure, otherwise, the DataNode will die.
> In AMBARI-12252, I fixed a bug so that Ambari would prevent an unmounted drive from allowing
HDFS to write to the root partition.
> However, this approach relies on the /etc/hadoop/conf/dfs_data_dir_mount.hist file existing,
and the original configuration being correct.
> The ideal way to fix this is,
> * Track which data dirs the admin wants mounted on a non-root partition. If the admin
wishes all data dirs to be on non-root mounts, but the initial install is incorrect, then
this should be reported as a problem.
> * Keep the history of the mount points in the database. Today, if the cache file is deleted
or the host reimaged, then this information is lost.
> * Introduce a new state between FAILED and COMPLETED, such as COMPLETED_WITH_ERRORS,
that will allow tasks to look differently in the UI, so the user can clearly detect when a
critical but non fatal error happened.
> Plugin with Alert Framework

This message was sent by Atlassian JIRA

View raw message