Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CDC7D200C0A for ; Sat, 14 Jan 2017 01:55:30 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id CC624160B3F; Sat, 14 Jan 2017 00:55:30 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 257C7160B47 for ; Sat, 14 Jan 2017 01:55:30 +0100 (CET) Received: (qmail 89817 invoked by uid 500); 14 Jan 2017 00:55:29 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 89784 invoked by uid 99); 14 Jan 2017 00:55:29 -0000 Received: from Unknown (HELO jira-lw-us.apache.org) (207.244.88.139) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Jan 2017 00:55:29 +0000 Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 8BA7525286 for ; Sat, 14 Jan 2017 00:55:26 +0000 (UTC) Date: Sat, 14 Jan 2017 00:55:26 +0000 (UTC) From: "Manoj Govindassamy (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-11340) DataNode reconfigure for disks doesn't remove the failed volumes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 14 Jan 2017 00:55:31 -0000 [ https://issues.apache.org/jira/browse/HDFS-11340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-11340: -------------------------------------- Attachment: HDFS-11340.01.patch Attached v01 to address the following * {{DataNode#parseChangedVolumes}} updated to rightly detect the removal of a failed volume during Reconfigure with new conf * {{DataNode#refreshVolumes}} updated to include failed volume count to verify for valid new conf * {{FsDataSetImpl#removeVolumes}} updated to prune failed volumes list at the time of reconfigure * {{TestDataNodeVolumeFailureReporting}} updated with new unit test and corner cases around reconfigure with failed volume removal [~eddyxu], can you please take a look at the patch ? > DataNode reconfigure for disks doesn't remove the failed volumes > ---------------------------------------------------------------- > > Key: HDFS-11340 > URL: https://issues.apache.org/jira/browse/HDFS-11340 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.0.0-alpha1 > Reporter: Manoj Govindassamy > Assignee: Manoj Govindassamy > Attachments: HDFS-11340.01.patch > > > Say a DataNode (uuid:xyz) has disks D1 and D2. When D1 turns bad, JMX query on FSDatasetState-xyz for "NumFailedVolumes" attr rightly shows the failed volume count as 1 and the "FailedStorageLocations" attr has the failed storage location as "D1". > It is possible to add or remove disks to this DataNode by running {{reconfigure}} command. Let the failed disk D1 be removed from the conf and the new conf has only one good disk D2. Upon running the reconfigure command for this DataNode with this new disk conf, the expectation is DataNode would no more have "NumFailedVolumes" or "FailedStorageLocations". But, even after removing the failed disk from the conf and a successful reconfigure, DataNode continues to show the "NumFailedVolumes" as 1 and "FailedStorageLocations" as "D1" and it never gets reset. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org