Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 191381985B for ; Thu, 17 Mar 2016 02:48:34 +0000 (UTC) Received: (qmail 33395 invoked by uid 500); 17 Mar 2016 02:48:33 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 33340 invoked by uid 500); 17 Mar 2016 02:48:33 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 33298 invoked by uid 99); 17 Mar 2016 02:48:33 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Mar 2016 02:48:33 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 7E3242C1F62 for ; Thu, 17 Mar 2016 02:48:33 +0000 (UTC) Date: Thu, 17 Mar 2016 02:48:33 +0000 (UTC) From: "Harsh J (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-9949) Testcase for catching DN UUID regeneration regression MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-9949: -------------------------- Description: In the following scenario, in releases without HDFS-8211, the DN may regenerate its UUIDs unintentionally. 0. Consider a DN with two disks {{/data1/dfs/dn,/data2/dfs/dn}} 1. Stop DN 2. Unmount the second disk, {{/data2/dfs/dn}} 3. Create (in the scenario, this was an accident) /data2/dfs/dn on the root path 4. Start DN 5. DN now considers /data2/dfs/dn empty so formats it, but during the format it uses {{datanode.getDatanodeUuid()}} which is null until register() is called. 6. As a result, after the directory loading, {{datanode.checkDatanodUuid()}} gets called with successful condition, and it causes a new generation of UUID which is written to all disks {{/data1/dfs/dn/current/VERSION}} and {{/data2/dfs/dn/current/VERSION}}. 7. Stop DN (in the scenario, this was when the mistake of unmounted disk was realised) 8. Mount the second disk back again {{/data2/dfs/dn}}, causing the {{VERSION}} file to be the original one again on it (mounting masks the root path that we last generated upon). 9. DN fails to start up cause it finds mismatched UUID between the two disks, with an error similar to: {code}WARN org.apache.hadoop.hdfs.server.common.Storage: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data/2/dfs/dn is in an inconsistent state: Root /data/2/dfs/dn: DatanodeUuid=fe3a848f-beb8-4fcb-9581-c6fb1c701cc4, does not match 8ea9493c-7097-4ee3-96a3-0cc4dfc1d6ac from other StorageDirectory.{code} The DN should not generate a new UUID if one of the storage disks already have the older one. HDFS-8211 unintentionally fixes this by changing the {{datanode.getDatanodeUuid()}} function to rely on the {{DataStorage}} representation of the UUID vs. the {{DatanodeID}} object which only gets available (non-null) _after_ the registration. It'd still be good to add a direct test case to the above scenario that passes on trunk and branch-2, but fails on branch-2.7 and lower, so we can catch a regression around this in future. was: In the following scenario, in releases without HDFS-8211, the DN may regenerate its UUIDs unintentionally. 0. Consider a DN with two disks {{/data1/dfs/dn,/data2/dfs/dn}} 1. Stop DN 2. Unmount the second disk, {{/data2/dfs/dn}} 3. Create (in the scenario, this was an accident) /data2/dfs/dn on the root path 4. Start DN 5. DN now considers /data2/dfs/dn empty so formats it, but during the format it uses {{datanode.getDatanodeUuid()}} which is null until register() is called. 6. As a result, after the directory loading, {{datanode.checkDatanodUuid()}} gets called with successful condition, and it causes a new generation of UUID which is written to all disks {{/data1/dfs/dn/current/VERSION}} and {{/data2/dfs/dn/current/VERSION}}. 7. Stop DN (in the scenario, this was when the mistake of unmounted disk was realised) 8. Mount the second disk back again {{/data2/dfs/dn}}, causing the {{VERSION}} file to be the original one again on it (mounting masks the root path that we last generated upon). 9. DN fails to start up cause it finds mismatched UUID between the two disks, with an error similar to: {code}WARN org.apache.hadoop.hdfs.server.common.Storage: {{org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data/2/dfs/dn is in an inconsistent state: Root /data/2/dfs/dn: DatanodeUuid=fe3a848f-beb8-4fcb-9581-c6fb1c701cc4, does not match 8ea9493c-7097-4ee3-96a3-0cc4dfc1d6ac from other StorageDirectory.{code} The DN should not generate a new UUID if one of the storage disks already have the older one. HDFS-8211 unintentionally fixes this by changing the {{datanode.getDatanodeUuid()}} function to rely on the {{DataStorage}} representation of the UUID vs. the {{DatanodeID}} object which only gets available (non-null) _after_ the registration. It'd still be good to add a direct test case to the above scenario that passes on trunk and branch-2, but fails on branch-2.7 and lower, so we can catch a regression around this in future. > Testcase for catching DN UUID regeneration regression > ----------------------------------------------------- > > Key: HDFS-9949 > URL: https://issues.apache.org/jira/browse/HDFS-9949 > Project: Hadoop HDFS > Issue Type: Test > Affects Versions: 2.6.0 > Reporter: Harsh J > Assignee: Harsh J > Priority: Minor > Attachments: HDFS-9949.000.branch-2.7.not-for-commit.patch, HDFS-9949.000.patch, HDFS-9949.001.patch > > > In the following scenario, in releases without HDFS-8211, the DN may regenerate its UUIDs unintentionally. > 0. Consider a DN with two disks {{/data1/dfs/dn,/data2/dfs/dn}} > 1. Stop DN > 2. Unmount the second disk, {{/data2/dfs/dn}} > 3. Create (in the scenario, this was an accident) /data2/dfs/dn on the root path > 4. Start DN > 5. DN now considers /data2/dfs/dn empty so formats it, but during the format it uses {{datanode.getDatanodeUuid()}} which is null until register() is called. > 6. As a result, after the directory loading, {{datanode.checkDatanodUuid()}} gets called with successful condition, and it causes a new generation of UUID which is written to all disks {{/data1/dfs/dn/current/VERSION}} and {{/data2/dfs/dn/current/VERSION}}. > 7. Stop DN (in the scenario, this was when the mistake of unmounted disk was realised) > 8. Mount the second disk back again {{/data2/dfs/dn}}, causing the {{VERSION}} file to be the original one again on it (mounting masks the root path that we last generated upon). > 9. DN fails to start up cause it finds mismatched UUID between the two disks, with an error similar to: > {code}WARN org.apache.hadoop.hdfs.server.common.Storage: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data/2/dfs/dn is in an inconsistent state: Root /data/2/dfs/dn: DatanodeUuid=fe3a848f-beb8-4fcb-9581-c6fb1c701cc4, does not match 8ea9493c-7097-4ee3-96a3-0cc4dfc1d6ac from other StorageDirectory.{code} > The DN should not generate a new UUID if one of the storage disks already have the older one. > HDFS-8211 unintentionally fixes this by changing the {{datanode.getDatanodeUuid()}} function to rely on the {{DataStorage}} representation of the UUID vs. the {{DatanodeID}} object which only gets available (non-null) _after_ the registration. > It'd still be good to add a direct test case to the above scenario that passes on trunk and branch-2, but fails on branch-2.7 and lower, so we can catch a regression around this in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)