hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-5274) Use smartctl to determine health of disks
Date Wed, 22 Jun 2016 15:34:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344527#comment-15344527
] 

Allen Wittenauer edited comment on YARN-5274 at 6/22/16 3:34 PM:
-----------------------------------------------------------------

This is pretty much one of many things a health check script *could* be doing.  One of the
key reasons why things like this weren't built into the code base early on is because it's
nearly impossible to figure out what is happening locally and do the right thing:

* What happens on SSDs?
* What if there are no SMART enabled devices on the box? 
* Do we loop through all devices try to determine which configured directories map to which
disks?
* What if we have a volume manager or pooled storage?

etc.

This really feels like overstepping our bounds and increasing code surface area for not a
lot of win and a lot of long term pain.  This is especially true for something like smartctl
that requires privilege.  That's a ton of baggage to add.

FWIW: I feel like most of the stuff presented in the umbrella JIRA suffers from the same problems.
 If one takes a simplistic view of how machines are configured, fine.  But that may not even
cover the majority of real-world installs!



was (Author: aw):
This is pretty much one of many things a health check script *could* be doing.  One of the
key reasons why things like this weren't built into the code base early on is because it's
nearly impossible to figure out what is happening locally and do the right thing:

* What happens on SSDs?
* What if there are no SMART enabled devices on the box? 
* Do we loop through all devices are try to figure out what configured directories map to
which disks?
* What if we have a volume manager or pooled storage?

etc.

This really feels like overstepping our bounds and increasing code surface area for not a
lot of win and a lot of long term pain.  This is especially true for something like smartctl
that requires privilege.  That's a ton of baggage to add.

FWIW: I feel like most of the stuff presented in the umbrella JIRA suffers from the same problems.
 If one takes a simplistic view of how machines are configured, fine.  But that may not even
cover the majority of real-world installs!


> Use smartctl to determine health of disks
> -----------------------------------------
>
>                 Key: YARN-5274
>                 URL: https://issues.apache.org/jira/browse/YARN-5274
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Varun Vasudev
>
> It would be nice to add support for smartctl(on machines where it is available) to determine
disk health for the YARN local and log dirs(if smartctl is applicable). The current disk checking
mechanism misses out on issues like bad sectors, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message