hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4859) Add timeout in FileJournalManager
Date Tue, 28 May 2013 22:50:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668772#comment-13668772

Kihwal Lee commented on HDFS-4859:

bq. If the NameNode hangs, ZKFC will detect it.

I understand that ZKFC will detect the failures if NN does not respond to RPC calls or the
internal resource check fails.  If all RPC handlers are waiting for a very long logSync()
to finish, this may be detected as well. But if a couple of handlers are in trouble due to
I/O hang and all others are happily serving reads, the error condition may not be detected
in time. The situation will be different, of course, if the underlying journal flush can timeout.

I think adding timeout will still be useful since users can run combination of a HA-JM and
FJM. Ideally, NN should be able to detect and exclude failed storages with a predictable/configurable
latency, regardless of underlying implementation. 
> Add timeout in FileJournalManager
> ---------------------------------
>                 Key: HDFS-4859
>                 URL: https://issues.apache.org/jira/browse/HDFS-4859
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, namenode
>    Affects Versions: 2.0.4-alpha
>            Reporter: Kihwal Lee
> Due to absence of explicit timeout in FileJournalManager, error conditions that incur
long delay (usually until driver timeout) can make namenode unresponsive for long time. This
directly affects NN's failure detection latency, which is critical in HA.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message