hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <...@yahoo-inc.com>
Subject Re: DFS replication and Error Recovery on failure
Date Mon, 29 Dec 2008 19:36:20 GMT

> 1) If i set value of dfs.replication to 3 only in hadoop-site.xml of
> namenode(master) and
> then restart the cluster will this take effect. or  i have to change
> hadoop-site.xml at all slaves ?

dfs.replication is the name-node parameter, so you need to restart
only the name-node in order to reset the value.
I should mention that setting new value will not immediately change
replication of the existing blocks, because replication is per file,
and you need to use setReplication to change it.
Although for new files the replication will be set to the new value
automatically.

> 2)
> What can be possible cause of following error at a datanode. ?
> ERROR org.apache.hadoop.dfs.DataNode: java.io.IOException: Incompatible
> namespaceIDs in
> /mnt/hadoop28/HADOOP/hadoop-0.16.3/tmp/dir/hadoop-hadoop/dfs/data:
> namenode namespaceID = 1396640905; datanode namespaceID = 820259954

namespaceID provides cluster integrity. name- and data-nodes share the same value.
This either means you ran the data-nodes with another name-node,
or you reformatted the name-node recently.
It is better to have a dedicated directory for data-node storage rather
than use "tmp".

> If my data node goes down due to above error, what should i do in
> following scenarios
> 1) i have some data on the currupted data node that i need to recover,
> how can i recover that data ?

You should make sure first which cluster it belongs to.

> 2) If i dont care about the data, but i want the node back on the
> cluster, can i just delete the /mnt/hadoop28/HADOOP/hadoop-0.16.3/tmp
> and include the node back in the cluster?

Yes you can remove the directory if you dont need the data.

Thanks,
--Konstantin

Mime
View raw message