Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 85CD69D6C for ; Tue, 11 Oct 2011 07:04:58 +0000 (UTC) Received: (qmail 32916 invoked by uid 500); 11 Oct 2011 07:04:57 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 32834 invoked by uid 500); 11 Oct 2011 07:04:57 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 32607 invoked by uid 99); 11 Oct 2011 07:04:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Oct 2011 07:04:54 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Oct 2011 07:04:51 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 380D7302FB2 for ; Tue, 11 Oct 2011 07:04:30 +0000 (UTC) Date: Tue, 11 Oct 2011 07:04:30 +0000 (UTC) From: "M. C. Srivas (Commented) (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1049347934.18138.1318316670230.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1407279375.12068.1318113089869.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-2422) The NN should tolerate the same number of low-resource volumes as failed volumes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124766#comment-13124766 ] M. C. Srivas commented on HDFS-2422: ------------------------------------ Konstantin and Todd, should the timeout be short, or long? >From the NFS FAQ ... http://nfs.sourceforge.net/#faq_e4 ... soft mounts can cause silent data corruption, even in the middle of a file, when a brief outage occurs. Thus, during recovery, even though the edits-log looks up-to-date, it might contain bad pages in the middle. If you wish to use soft-mounts, then the recovery process should verify all the logs before picking one of them to use for replay. (I am not sure if there are CRCs on every record of the edits-log .. are there?) Otherwise, with soft-mounts, you will hit issues like HDFS-1382. > The NN should tolerate the same number of low-resource volumes as failed volumes > -------------------------------------------------------------------------------- > > Key: HDFS-2422 > URL: https://issues.apache.org/jira/browse/HDFS-2422 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.24.0 > Reporter: Jeff Bean > Assignee: Aaron T. Myers > Fix For: 0.24.0 > > Attachments: HDFS-2422.patch > > > We encountered a situation where the namenode dropped into safe mode after a temporary outage of an NFS mount. > At 12:10 the NFS server goes offline > Oct 8 12:10:05 kernel: nfs: server not responding, timed out > This caused the namenode to conclude resource issues: > 2011-10-08 12:10:34,848 WARN org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available on volume '' is 0, which is below the configured reserved amount 104857600 > Temporary loss of NFS mount shouldn't cause safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira