hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1379) Multihoming brokenness in HDFS
Date Tue, 07 Sep 2010 15:42:33 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906841#action_12906841

Allen Wittenauer commented on HDFS-1379:

Some of the issues here are also covered in HADOOP-6364 .

But yes, multi-homing is a known brokenness.

It is probably worth pointing out that 

a) the bang-for-buck by having a separate network for IPC/RPC communications isn't very good,
so pretty much no one does it

b) monitoring a private interface instead of the public one leaves you exposed to failures
on the network side 

> Multihoming brokenness in HDFS
> ------------------------------
>                 Key: HDFS-1379
>                 URL: https://issues.apache.org/jira/browse/HDFS-1379
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client, name-node
>    Affects Versions: 0.20.1
>         Environment: Multi-homed namenode and datanodes. hadoop-0.20.1 (cloudera distribution
on linux)
>            Reporter: Matthew Byng-Maddick
> We have a setup where - because we only have a very few machines (4 x 16 core) we're
looking at co-locating namenodes and datanodes. We also have front-end and back-end networks.
Set-up is something like:
> * machine1
> ** front-end
> ** back-end
> * machine2
> ** front-end
> ** back-end
> * machine3
> ** front-end
> ** back-end
> * machine4
> ** front-end
> ** back-end
> On each, the property *slave.host.name* is configured with the 192. address, (the *.dns.interface
settings don't actually seem to help, but that's a separate problem), and the *dfs.datanode.address*
is bound to the 192.168.24.x address on :50010, similarly the *dfs.datanode.ipc.address* is
bound there.
> In order to get efficient use of our machines, we bring up a namenode on one of them
(this then rsyncs the latest namenode fsimage etc) by bringing up a VIP on each side (we use
the 10.18.80.x side for monitoring, rather than actual hadoop comms), and binding the namenode
to that - on the inside this is
> The namenode now knows about 4 datanodes - These datanodes know
how they're bound. However, when the datanode is telling an external hdfs client where to
store the blocks, it gives out as one of the addresses (despite the datanode
not being bound there) - because that's where the datanode->namenode RPC comes from.
> This is wrong because if you've bound the datanode explicitly (using *dfs.datanode.address*)
then that's should be the only address the namenode can give out (it's reasonable, given your
comms model not to support NAT between clients and data slaves). If you bind it to * then
your normal rules for slave.host.name, dfs.datanode.dns.interface etc should take precedence.
> This may already be fixed in later releases than 0.20.1 - but if it isn't it should probably
be - you explicitly allow binding of the datanode addresses - it's unreasonable to expect
that comms to the datanode will always come from those addresses - especially in multi-homed
environments (and separating traffic out by network - especially when dealing with large volumes
of data) is useful.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message