hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Pendleton (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-56) hadoop nameserver does not recognise ndfs nameserver image
Date Mon, 27 Feb 2006 19:47:14 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-56?page=comments#action_12368013 ] 

Bryan Pendleton commented on HADOOP-56:

The advantage of the GUID over enhancing the configuration is that it prevents anyone from
"screwing up" - the defaults will never cause data loss. Without it, even with "format", it's
possible for someone to bring up a set of datanodes on a shared set of machines and start
clobbering data. With a GUID, this is no longer an accident that can happen. The original
issue that spawned this discussion was with this kind of situation - settings in a config
file lead to data "cleanup" that wasn't desired. With a GUID, this risk goes away. Without
it, especially with default data directories like /tmp/, it's very easy for two different
people to clobber each others data by running datanode instances on the same machine(s).

I understand there's a big push back against complexity. But hadoop is a component likely
to be used in a lot of situations, by users who might or might not have complete control of
their cluster. The DFS layer is supposed to provide data reliability, so it seems appropriate
to put in guards against bad end-user behavior/misconfigurations, if it's not going to be
a big cost in performance (it shouldn't - what, an extra string during the initial chat between
namenode/datanode?), or storage (it shouldn't add more than a few extra bytes to the filename
of each block - or a whole GUID subdir, rather than/in addition to the suggested named paths).

An much weaker alternative to prevent only the one worst-case I'm highlighting, would be for
a datanode to shutdown with an error if *none* of the blocks on in a datanode's storage directory
are from live files in the DFS. I think that is a far less powerful fix, with the only benefit
being that it doesn't require changing the behavior of virtually any of the existing code.

> hadoop nameserver does not recognise ndfs nameserver image
> ----------------------------------------------------------
>          Key: HADOOP-56
>          URL: http://issues.apache.org/jira/browse/HADOOP-56
>      Project: Hadoop
>         Type: Bug
>   Components: dfs
>     Versions: 0.1
>     Reporter: Yoram Arnon
>     Priority: Critical
>  Attachments: ndfs.tar.gz
> hadoop nameserver does not recognise ndfs image
> Thus, upgrading from ndfs to hadoop dfs results in total data loss.
> The upgrade should be seemless, with the new server recognising all previous version
that are not end-of-life'd.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message