hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13738) DiskChecker should perform some disk IO
Date Fri, 28 Oct 2016 22:30:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616800#comment-15616800
] 

Arpit Agarwal commented on HADOOP-13738:
----------------------------------------

Thanks for the feedback all! I've incorporated most comments.

bq. so I am not sure why we need random at all here
Changed file naming scheme to use fixed names. If we hit two successive failures then we'll
try once more with a randomized file name.

bq. shouldn't diskchecker be able to understand the delete operation failed
Fixed.

bq. Can we have some timer/threshold (in ms level) for the expected execution time of each
diskIoCheckWithoutNativeIo() test to break out of the retry loop
Hi [~xyao], that will require spawning a thread. DiskChecker will have to maintain a thread
pool. We could end up with many threads stalled on a slow disk and checks of healthy disks
waiting for thread availability. It is easier to solve this in the caller. Let me know if
you're okay with deferring this particular problem for now.

> DiskChecker should perform some disk IO
> ---------------------------------------
>
>                 Key: HADOOP-13738
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13738
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: HADOOP-13738.01.patch, HADOOP-13738.02.patch, HADOOP-13738.03.patch,
HADOOP-13738.04.patch
>
>
> DiskChecker can fail to detect total disk/controller failures indefinitely. We have seen
this in real clusters. DiskChecker performs simple permissions-based checks on directories
which do not guarantee that any disk IO will be attempted.
> A simple improvement is to write some data and flush it to the disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message