Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F0E109489 for ; Tue, 11 Oct 2011 16:27:51 +0000 (UTC) Received: (qmail 38499 invoked by uid 500); 11 Oct 2011 16:27:51 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 38458 invoked by uid 500); 11 Oct 2011 16:27:51 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 38450 invoked by uid 99); 11 Oct 2011 16:27:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Oct 2011 16:27:51 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Oct 2011 16:27:33 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id D777F303AD2 for ; Tue, 11 Oct 2011 16:27:11 +0000 (UTC) Date: Tue, 11 Oct 2011 16:27:11 +0000 (UTC) From: "M. C. Srivas (Commented) (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <805554695.1347.1318350431883.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1407279375.12068.1318113089869.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-2422) The NN should tolerate the same number of low-resource volumes as failed volumes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125165#comment-13125165 ] M. C. Srivas commented on HDFS-2422: ------------------------------------ @Todd: With soft mounts, if the server goes down, I'd expect that the fsync would fail. However, you wouldn't have any guarantee about what happened to all of the previous writes since the last successful fsync through the new failed fsync. SOme of them might have succeeded and some might get lost. Conceivably, some of them might get performed again when the server recovers. So, I'd recommend that once you switch from one log to another, that you unlink the previous one when you get the chance before using it again, just to make sure you don't get any ghost writes showing up later. > The NN should tolerate the same number of low-resource volumes as failed volumes > -------------------------------------------------------------------------------- > > Key: HDFS-2422 > URL: https://issues.apache.org/jira/browse/HDFS-2422 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.24.0 > Reporter: Jeff Bean > Assignee: Aaron T. Myers > Fix For: 0.24.0 > > Attachments: HDFS-2422.patch > > > We encountered a situation where the namenode dropped into safe mode after a temporary outage of an NFS mount. > At 12:10 the NFS server goes offline > Oct 8 12:10:05 kernel: nfs: server not responding, timed out > This caused the namenode to conclude resource issues: > 2011-10-08 12:10:34,848 WARN org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available on volume '' is 0, which is below the configured reserved amount 104857600 > Temporary loss of NFS mount shouldn't cause safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira