hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5274) Use smartctl to determine health of disks
Date Wed, 22 Jun 2016 22:58:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345328#comment-15345328

Allen Wittenauer commented on YARN-5274:

bq. The node health script is meant for the health of the node. It can't mark a single disk
as bad. 

Yes, I'm very familiar with both the health check (esp given I'm the one who pushed for it
to get added to begin with...) and smartctl. 

bq. The health test to determine if a disk should be valid whether the disk is a HDD or SSD.
We shouldn't use smartctl if it doesn't apply to storage in question, and fallback on the
existing checks.

If I configure a file system to use /hadoop/1/tmp  and /hadoop/1's mount device is hadoop1/1,
now what? Is it going to be smart enough to look to see what devices the hadoop1 pool has
in it?

bq. Where explicit monitoring does not exist, the NM can take some pro-active steps to detect
bad disks.

But that's my point:  explicit monitoring DOES exist, just not inside Hadoop. There are whole
industries based around hardware monitoring that user's should be deploying.   Trying to do
it all is part of why YARN is descending into chaos.  There are times when it is appropriate
to walk away and say "this isn't our core competency, let someone else do it.".  This is one
of them.

Besides: why is this a YARN-specific problem?  Shouldn't this be in HADOOP so that both HDFS
and YARN can take advantage of any code written? 

> Use smartctl to determine health of disks
> -----------------------------------------
>                 Key: YARN-5274
>                 URL: https://issues.apache.org/jira/browse/YARN-5274
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Varun Vasudev
> It would be nice to add support for smartctl(on machines where it is available) to determine
disk health for the YARN local and log dirs(if smartctl is applicable). The current disk checking
mechanism misses out on issues like bad sectors, etc.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message