hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shady Xu <shad...@gmail.com>
Subject Re: How to distcp data between two clusters which are not in the same local network?
Date Tue, 16 Aug 2016 02:27:52 GMT
Thanks Wei-Chiu and Sunil, I have read the docs you mentioned before
starting. The specific problem now is that the DataNodes of the source
cluster report their local ip instead of the public one, which cannot be
accessed from the NodeManagers of the destination cluster. Seems the
solution is to set the `dfs.datanode.dns.interface` property but
unfortunately it doesn't work.

2016-08-15 22:06 GMT+08:00 Sunil Govind <sunil.govind@gmail.com>:

> Hi
> I think you can also refer below link too.
> http://aajisaka.github.io/hadoop-project/hadoop-distcp/DistCp.html
> Thanks
> Sunil
> On Mon, Aug 15, 2016 at 7:26 PM Wei-Chiu Chuang <weichiu@apache.org>
> wrote:
>> Hello,
>> if I understand your question correctly, you are actually building a
>> multi-home Hadoop, correct?
>> Multi-homed Hadoop cluster can be tricky to set up, to the extend that
>> Cloudera does not recommend it. I've not set up a multihome Hadoop cluster
>> before, but I think you have to make sure the reverse resolution works for
>> the IP addresses.
>> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/
>> HdfsMultihoming.html
>> On Mon, Aug 15, 2016 at 1:06 AM, Shady Xu <shadyxu@gmail.com> wrote:
>>> Hi all,
>>> Recently I tried to use distcp to copy data across two clusters which
>>> are not in the same local network. Fortunately, the nodes of the source
>>> cluster each has an extra interface and ip which can be accessed from the
>>> destination cluster. But during the process of distcp, the map tasks always
>>> used the local ip of the source cluster nodes which they cannot reach.
>>> I tried changing the property 'dfs.datanode.dns.interface' to the one I
>>> want, and I tried changing the property 'dfs.datanode.use.datanode.
>>> hostname' to true too. Nothing works.
>>> Does hadoop now support this or do I miss something?

View raw message