hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaoyu Yao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13738) DiskChecker should perform some disk IO
Date Tue, 25 Oct 2016 21:35:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606549#comment-15606549
] 

Xiaoyu Yao commented on HADOOP-13738:
-------------------------------------

Thanks [~arpiagariu] for working on this, [~kihwal] and [~anu] for the discussion. 

I can see some benefits of using random file name. The diskchecker may run multiple times.
A random file name will not be impacted by the failed deletion from previous runs. If we want
to use pattern for test file naming, we should do clean up of files from previous run before
the disk check like [~arpitagarwal] has already done in the unit test. 

Can we have some timer/threshold (in ms level) for the expected execution time of each diskIoCheckWithoutNativeIo()
test to break out of the retry loop? This way, we won't have to wait forever even with the
current serialized disk check in datanode. 

> DiskChecker should perform some disk IO
> ---------------------------------------
>
>                 Key: HADOOP-13738
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13738
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: HADOOP-13738.01.patch, HADOOP-13738.02.patch, HADOOP-13738.03.patch
>
>
> DiskChecker can fail to detect total disk/controller failures indefinitely. We have seen
this in real clusters. DiskChecker performs simple permissions-based checks on directories
which do not guarantee that any disk IO will be attempted.
> A simple improvement is to write some data and flush it to the disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message