hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9949) Add a test case to ensure that the DataNode does not regenerate its UUID when a storage directory is cleared
Date Thu, 17 Mar 2016 18:30:33 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200117#comment-15200117

Hudson commented on HDFS-9949:

FAILURE: Integrated in Hadoop-trunk-Commit #9473 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9473/])
HDFS-9949. Add a test case to ensure that the DataNode does not (cmccabe: rev dc951e606f40bb779632a8a3e3a46aeccc4a446a)
* hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeUUID.java

> Add a test case to ensure that the DataNode does not regenerate its UUID when a storage
directory is cleared
> ------------------------------------------------------------------------------------------------------------
>                 Key: HDFS-9949
>                 URL: https://issues.apache.org/jira/browse/HDFS-9949
>             Project: Hadoop HDFS
>          Issue Type: Test
>    Affects Versions: 2.6.0
>            Reporter: Harsh J
>            Assignee: Harsh J
>            Priority: Minor
>             Fix For: 2.8.0
>         Attachments: HDFS-9949.000.branch-2.7.not-for-commit.patch, HDFS-9949.000.patch,
> In the following scenario, in releases without HDFS-8211, the DN may regenerate its UUIDs
> 0. Consider a DN with two disks {{/data1/dfs/dn,/data2/dfs/dn}}
> 1. Stop DN
> 2. Unmount the second disk, {{/data2/dfs/dn}}
> 3. Create (in the scenario, this was an accident) /data2/dfs/dn on the root path
> 4. Start DN
> 5. DN now considers /data2/dfs/dn empty so formats it, but during the format it uses
{{datanode.getDatanodeUuid()}} which is null until register() is called.
> 6. As a result, after the directory loading, {{datanode.checkDatanodUuid()}} gets called
with successful condition, and it causes a new generation of UUID which is written to all
disks {{/data1/dfs/dn/current/VERSION}} and {{/data2/dfs/dn/current/VERSION}}.
> 7. Stop DN (in the scenario, this was when the mistake of unmounted disk was realised)
> 8. Mount the second disk back again {{/data2/dfs/dn}}, causing the {{VERSION}} file to
be the original one again on it (mounting masks the root path that we last generated upon).
> 9. DN fails to start up cause it finds mismatched UUID between the two disks, with an
error similar to:
> {code}WARN org.apache.hadoop.hdfs.server.common.Storage: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
Directory /data/2/dfs/dn is in an inconsistent state: Root /data/2/dfs/dn: DatanodeUuid=fe3a848f-beb8-4fcb-9581-c6fb1c701cc4,
does not match 8ea9493c-7097-4ee3-96a3-0cc4dfc1d6ac from other StorageDirectory.{code}
> The DN should not generate a new UUID if one of the storage disks already have the older
> HDFS-8211 unintentionally fixes this by changing the {{datanode.getDatanodeUuid()}} function
to rely on the {{DataStorage}} representation of the UUID vs. the {{DatanodeID}} object which
only gets available (non-null) _after_ the registration.
> It'd still be good to add a direct test case to the above scenario that passes on trunk
and branch-2, but fails on branch-2.7 and lower, so we can catch a regression around this
in future.

This message was sent by Atlassian JIRA

View raw message