hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rushabh S Shah (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-5522) Datanode disk error check may be incorrectly skipped
Date Sat, 10 May 2014 21:57:24 GMT

     [ https://issues.apache.org/jira/browse/HDFS-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Rushabh S Shah updated HDFS-5522:

    Attachment: HDFS-5522-v3.patch

Thanks Kihwal for your comments.
I incorporated your first and second comment.
For the third comment, I agree that start and termination of checkDiskError Thread should
be logged. But for your suggestion to place outside synchronized block, I think that will
not be good place since it will log whenever checkDiskError() is called.
So I logged inside the synchronized block.
Let me know if you have more comments.

> Datanode disk error check may be incorrectly skipped
> ----------------------------------------------------
>                 Key: HDFS-5522
>                 URL: https://issues.apache.org/jira/browse/HDFS-5522
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.23.9, 2.2.0
>            Reporter: Kihwal Lee
>            Assignee: Rushabh S Shah
>         Attachments: HDFS-5522-v2.patch, HDFS-5522-v3.patch, HDFS-5522.patch
> After HDFS-4581 and HDFS-4699, {{checkDiskError()}} is not called when network errors
occur during processing data node requests.  This appears to create problems when a disk is
having problems, but not failing I/O soon. 
> If I/O hangs for a long time, network read/write may timeout first and the peer may close
the connection. Although the error was caused by a faulty local disk, disk check is not being
carried out in this case. 

This message was sent by Atlassian JIRA

View raw message