Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 26C377FFD for ; Mon, 10 Oct 2011 23:04:52 +0000 (UTC) Received: (qmail 77263 invoked by uid 500); 10 Oct 2011 23:04:51 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 77151 invoked by uid 500); 10 Oct 2011 23:04:51 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 77134 invoked by uid 99); 10 Oct 2011 23:04:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Oct 2011 23:04:51 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Oct 2011 23:04:50 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 21BC1301290 for ; Mon, 10 Oct 2011 23:04:30 +0000 (UTC) Date: Mon, 10 Oct 2011 23:04:30 +0000 (UTC) From: "Milind Bhandarkar (Commented) (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <1472881219.16816.1318287870139.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1407279375.12068.1318113089869.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-2422) The NN should tolerate the same number of low-resource volumes as failed volumes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124552#comment-13124552 ] Milind Bhandarkar commented on HDFS-2422: ----------------------------------------- Aaron, the failed volume policy should ensure that at least two volumes are up when writing edit logs. If it were only writing to one volume, and staying writable, then there is a time period when there is a single up-to-date replica of edit logs that can fail and lose modifications (that is why I said the window opens for losing data, anot that it will definitely lose data.). re: automatically coming out of safemode, I think transient unavailability of a volume, and a volume being low on disk space should be treated differently. While the second case requires admin intervention, the first case does not. > The NN should tolerate the same number of low-resource volumes as failed volumes > -------------------------------------------------------------------------------- > > Key: HDFS-2422 > URL: https://issues.apache.org/jira/browse/HDFS-2422 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.24.0 > Reporter: Jeff Bean > Assignee: Aaron T. Myers > Attachments: HDFS-2422.patch > > > We encountered a situation where the namenode dropped into safe mode after a temporary outage of an NFS mount. > At 12:10 the NFS server goes offline > Oct 8 12:10:05 kernel: nfs: server not responding, timed out > This caused the namenode to conclude resource issues: > 2011-10-08 12:10:34,848 WARN org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available on volume '' is 0, which is below the configured reserved amount 104857600 > Temporary loss of NFS mount shouldn't cause safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira