hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-107) Data-nodes should be formatted when the name-node is formatted.
Date Tue, 14 Jun 2011 19:16:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049349#comment-13049349
] 

Konstantin Shvachko commented on HDFS-107:
------------------------------------------

Uma, your approach doesn't work, if I understand it correctly. Block IDs are unique only within
one cluster. If you change namespaceID on a DataNode the NN will treat that blocks as belonging
to this cluster and can mix them up with those that were really created under the namespaceID.
Why would you optimize the format operation anyways? People actually don't format large clusters.
I've never heard of such thing. Data is too important. So the format operation is mostly useful
for small test clusters.
Option (1) gives an appropriate automation of manual removal of storage directories.

> Data-nodes should be formatted when the name-node is formatted.
> ---------------------------------------------------------------
>
>                 Key: HDFS-107
>                 URL: https://issues.apache.org/jira/browse/HDFS-107
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.23.0
>            Reporter: Konstantin Shvachko
>         Attachments: HDFS-107-1.patch
>
>
> The upgrade feature HADOOP-702 requires data-nodes to store persistently the namespaceID

> in their version files and verify during startup that it matches the one stored on the
name-node.
> When the name-node reformats it generates a new namespaceID.
> Now if the cluster starts with the reformatted name-node, and not reformatted data-nodes
> the data-nodes will fail with
> java.io.IOException: Incompatible namespaceIDs ...
> Data-nodes should be reformatted whenever the name-node is. I see 2 approaches here:
> 1) In order to reformat the cluster we call "start-dfs -format" or make a special script
"format-dfs".
> This would format the cluster components all together. The question is whether it should
start
> the cluster after formatting?
> 2) Format the name-node only. When data-nodes connect to the name-node it will tell them
to
> format their storage directories if it sees that the namespace is empty and its cTime=0.
> The drawback of this approach is that we can loose blocks of a data-node from another
cluster
> if it connects by mistake to the empty name-node.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message