hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4859) Add timeout in FileJournalManager
Date Wed, 29 May 2013 23:22:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669873#comment-13669873

Colin Patrick McCabe commented on HDFS-4859:

Can you be more clear about why just using {{QuorumJournalManager}} plus {{ZKFC}} doesn't
solve this problem?

You don't actually even need local storage directories any more; we only ever recommended
them because QJM new and untested.

It's not just fsync that can block forever, but any write, any read, any fstat, really any
blocking operation that touches the filesystem.  I  have seen ls go out to lunch forever on
a corrupted filesystem.  Are you going  to add "check if I timed out and kill myself if so"
recovery logic after every operation that touches the filesystem?  Every {{FileInputStream}}
or {{FileOutputStream}} or {{FileChannel}} method?  Are you going to carefully monitor each
new patch so that nobody adds back in a use of filechannel.size or whatever?
> Add timeout in FileJournalManager
> ---------------------------------
>                 Key: HDFS-4859
>                 URL: https://issues.apache.org/jira/browse/HDFS-4859
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, namenode
>    Affects Versions: 2.0.4-alpha
>            Reporter: Kihwal Lee
> Due to absence of explicit timeout in FileJournalManager, error conditions that incur
long delay (usually until driver timeout) can make namenode unresponsive for long time. This
directly affects NN's failure detection latency, which is critical in HA.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message