hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-4521) invalid network toploogies should not be cached
Date Fri, 15 Mar 2013 19:10:12 GMT

     [ https://issues.apache.org/jira/browse/HDFS-4521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-4521:
---------------------------------------

    Attachment: HDFS-4521.008.patch

* {{TestNetworkTopology}}: add a junit test for creating an invalid network topology, fixing
the topology, restarting the DN, and verifying  that it all works.

* rename {{clearCachedMappings}} to {{reloadCachedMappings}}.  {{StaticMapping#reloadCachedMappings}}
should do nothing, since StaticMapping does not have a cache.  ({{StaticMapping}} is only
used by junit tests-- that's why I didn't see this earlier).

The next two changes fall under the heading of more robust error handling.

* {{NetworkTopology}}: previously we threw an exception *after* adding the invalid node to
the topology tree.  We should throw the exception first and not add the invalid node.

* {{DatanodeManager#registerDatanode}}: remove the {{DatanodeDescriptor}} we were about to
add if any exceptions were thrown during the addition process.  Move the catch block for {{InvalidNetworkTopology}}
to the end of the function, to ensure that both code paths which can call {{NetworkTopology#add}}
(as well as any code paths that get added in the future) will reach it on this error.
                
> invalid network toploogies should not be cached
> -----------------------------------------------
>
>                 Key: HDFS-4521
>                 URL: https://issues.apache.org/jira/browse/HDFS-4521
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.5-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HDFS-4521.001.patch, HDFS-4521.002.patch, HDFS-4521.005.patch, HDFS-4521.006.patch,
HDFS-4521.008.patch
>
>
> When the network topology is invalid, the DataNode refuses to start with a message such
as this:
> {quote}
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.registerDatanode from 172.29.122.23:55886:
error:
> org.apache.hadoop.net.NetworkTopology$InvalidTopologyException: Invalid network topology.
You cannot have a rack and a non-rack node at the same level of the network topology.
> {quote}
> This is expected if you specify a topology file or script which puts leaf nodes at two
different depths.  However, one problem we have now is that this incorrect topology is cached
forever.  Once the NameNode sees it, this DataNode can never be added to the cluster, since
this exception will be rethrown each time.  The NameNode will not check to see if the topology
file or script has changed.  We should clear the topology mappings when there is an InvalidTopologyException,
to prevent this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message