hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shady Xu <shad...@gmail.com>
Subject Re: How to distcp data between two clusters which are not in the same local network?
Date Thu, 25 Aug 2016 05:38:24 GMT
Thanks iain, it works now. I read the doc you mentioned, but forgot to set
the `dfs.client.use.datanode.hostname` property in the destination cluster.

Though I still don't know why the `dfs.datanode.dns.interface` property does
not work. I read though the related source code but don't find anything

2016-08-25 1:48 GMT+08:00 iain wright <iainwrig@gmail.com>:

> @Shady, please see: https://hadoop.apache.org/docs/r2.7.2/hadoop-
> project-dist/hadoop-hdfs/HdfsMultihoming.html
> --
> Iain Wright
> This email message is confidential, intended only for the recipient(s)
> named above and may contain information that is privileged, exempt from
> disclosure under applicable law. If you are not the intended recipient, do
> not disclose or disseminate the message to anyone except the intended
> recipient. If you have received this message in error, or are not the named
> recipient(s), please immediately notify the sender by return email, and
> delete all copies of this message.
> On Wed, Aug 24, 2016 at 2:17 AM, Shady Xu <shadyxu@gmail.com> wrote:
>> Anyone any idea?
>> 2016-08-16 10:27 GMT+08:00 Shady Xu <shadyxu@gmail.com>:
>>> Thanks Wei-Chiu and Sunil, I have read the docs you mentioned before
>>> starting. The specific problem now is that the DataNodes of the source
>>> cluster report their local ip instead of the public one, which cannot be
>>> accessed from the NodeManagers of the destination cluster. Seems the
>>> solution is to set the `dfs.datanode.dns.interface` property but
>>> unfortunately it doesn't work.
>>> 2016-08-15 22:06 GMT+08:00 Sunil Govind <sunil.govind@gmail.com>:
>>>> Hi
>>>> I think you can also refer below link too.
>>>> http://aajisaka.github.io/hadoop-project/hadoop-distcp/DistCp.html
>>>> Thanks
>>>> Sunil
>>>> On Mon, Aug 15, 2016 at 7:26 PM Wei-Chiu Chuang <weichiu@apache.org>
>>>> wrote:
>>>>> Hello,
>>>>> if I understand your question correctly, you are actually building a
>>>>> multi-home Hadoop, correct?
>>>>> Multi-homed Hadoop cluster can be tricky to set up, to the extend that
>>>>> Cloudera does not recommend it. I've not set up a multihome Hadoop cluster
>>>>> before, but I think you have to make sure the reverse resolution works
>>>>> the IP addresses.
>>>>> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/ha
>>>>> doop-hdfs/HdfsMultihoming.html
>>>>> On Mon, Aug 15, 2016 at 1:06 AM, Shady Xu <shadyxu@gmail.com> wrote:
>>>>>> Hi all,
>>>>>> Recently I tried to use distcp to copy data across two clusters which
>>>>>> are not in the same local network. Fortunately, the nodes of the
>>>>>> cluster each has an extra interface and ip which can be accessed
from the
>>>>>> destination cluster. But during the process of distcp, the map tasks
>>>>>> used the local ip of the source cluster nodes which they cannot reach.
>>>>>> I tried changing the property 'dfs.datanode.dns.interface' to the
>>>>>> I want, and I tried changing the property '
>>>>>> dfs.datanode.use.datanode.hostname' to true too. Nothing works.
>>>>>> Does hadoop now support this or do I miss something?

View raw message