hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3962) NN should periodically check writability of 'required' journals
Date Fri, 03 May 2013 15:30:16 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648496#comment-13648496

Kihwal Lee commented on HDFS-3962:

In general the storage error detection and handling is not ideal.

* NNStorage : The checks in here seems mainly for detecting ROFS. The checks are performed
when rolling edit or writing an fsimage.
* FSEditLog : I/O errors are handled when logSync() fails. The storage state is kept locally
and not shared with NNStorage.
* Resource monitor : Enter safe mode if resource requirement is not met. E.g. not enough space.

Since NNStorage is unaware of the errors detected in FSEditLog, the failed storages will get
retried on the next edit rolling. The restoreFailedStorage setting and admin command have
no effect, since they only applies to NNStorage.  It will be better if these are tied together
in some way, especially when journals are also written to the image directory.

Another important piece missing is configurable storage timeout. I assume HA journal managers
are fine and it is just FileJournalManager. In most cases, logSync() will be the first one
who sees storage errors in namenode. (regular logging will also see it if the same fs is used)
For local disks, I/O timeout depends on the driver, which is usually much longer than what
a HA namenode wants it to be.  If the thread stuck in flush() can be interrupted on timeout,
local storage errors can be more reliably detected and service availability can improve. 
Unfortunately most test cases only simulate ROFS or layout changes due to difficulties simulating
or creating this condition, so this most common failure mode is often missed in testing. You
might have more experience on this, as you have done a lot of testing for HA.

For NFS mounts, one could adjust mount options to make I/O timeout early rather than hanging.
But I still think FileJournalManager should have a capability to timeout.  Even if NN does
not depend on local storage for the data consistency and correctness, it does affect service
availability if it does not have control over how long it will block.  

I don't mean to hijack this jira, but wanted to hear your view on storage error detection
and handling, and perhaps file jiras if we can identify what should be done.
> NN should periodically check writability of 'required' journals
> ---------------------------------------------------------------
>                 Key: HDFS-3962
>                 URL: https://issues.apache.org/jira/browse/HDFS-3962
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: ha, namenode
>    Affects Versions: 3.0.0, 2.0.1-alpha
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-3962.txt
> Currently, our HA design ensures "write fencing" by having the failover controller call
a fencing script before transitioning a new node to active. However, if the fencing script
is based on storage fencing (and not stonith), there is no _read_ fencing. That is to say,
the old active may continue to believe himself active for an unbounded amount of time, assuming
that it does not try to write to its edit log.
> This isn't super problematic, but it would be beneficial for monitoring, etc, to have
the old NN periodically check the writability of any "required" journals, and abort if they
become unwritable, even if there are no writes coming into the system.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message