Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1E38610973 for ; Fri, 1 Nov 2013 22:41:18 +0000 (UTC) Received: (qmail 88003 invoked by uid 500); 1 Nov 2013 22:41:18 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 87894 invoked by uid 500); 1 Nov 2013 22:41:18 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 87884 invoked by uid 99); 1 Nov 2013 22:41:18 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Nov 2013 22:41:17 +0000 Date: Fri, 1 Nov 2013 22:41:17 +0000 (UTC) From: "Ravi Prakash (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811724#comment-13811724 ] Ravi Prakash commented on YARN-90: ---------------------------------- Apart from DirectoryCollection changes, I think we should also update LocalDirAllocation.AllocatorPerContext. Maybe we should handle that in a separate JIRA. Anyway. I noticed that after this patch, although DirectoryCollection recovered the repaired directories, they were not actually used. I wonder if its something wrong with my test procedure or we need more changes. > NodeManager should identify failed disks becoming good back again > ----------------------------------------------------------------- > > Key: YARN-90 > URL: https://issues.apache.org/jira/browse/YARN-90 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Reporter: Ravi Gummadi > Attachments: YARN-90.1.patch, YARN-90.patch > > > MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes down, it is marked as failed forever. To reuse that disk (after it becomes good), NodeManager needs restart. This JIRA is to improve NodeManager to reuse good disks(which could be bad some time back). -- This message was sent by Atlassian JIRA (v6.1#6144)