hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9868) Add ability for DistCp to run between 2 clusters
Date Tue, 28 Feb 2017 05:03:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887273#comment-15887273
] 

Yongjun Zhang commented on HDFS-9868:
-------------------------------------

Thanks [~xiaochen]. I just committed HADOOP-14127.

Found a better way to distribute the conf files with DistributedCache, we could use
{code}
public void addCacheArchive(URI uri)
{code}
, if we create a tar file out of the conf dir, and use this api to send the tar file to the
distributed cache, then the same tarred dir hierarchy will be extracted and available at the
current working directory. 

See http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/filecache/DistributedCache.html

  

> Add ability for DistCp to run between 2 clusters
> ------------------------------------------------
>
>                 Key: HDFS-9868
>                 URL: https://issues.apache.org/jira/browse/HDFS-9868
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: distcp
>    Affects Versions: 2.7.1
>            Reporter: NING DING
>            Assignee: NING DING
>         Attachments: HDFS-9868.05.patch, HDFS-9868.06.patch, HDFS-9868.07.patch, HDFS-9868.08.patch,
HDFS-9868.09.patch, HDFS-9868.10.patch, HDFS-9868.1.patch, HDFS-9868.2.patch, HDFS-9868.3.patch,
HDFS-9868.4.patch
>
>
> Normally the HDFS cluster is HA enabled. It could take a long time when coping huge data
by distp. If the source cluster changes active namenode, the distp will run failed. This patch
supports the DistCp can read source cluster files in HA access mode. A source cluster configuration
file needs to be specified (via the -sourceClusterConf option).
>   The following is an example of the contents of a source cluster configuration
>   file:
> {code:xml}
>     <configuration>
>       <property>
> 		<name>fs.defaultFS</name>
> 		<value>hdfs://mycluster</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.nameservices</name>
> 		<value>mycluster</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.ha.namenodes.mycluster</name>
> 		<value>nn1,nn2</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.namenode.rpc-address.mycluster.nn1</name>
> 		<value>host1:9000</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.namenode.rpc-address.mycluster.nn2</name>
> 		<value>host2:9000</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.namenode.http-address.mycluster.nn1</name>
> 		<value>host1:50070</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.namenode.http-address.mycluster.nn2</name>
> 		<value>host2:50070</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.client.failover.proxy.provider.mycluster</name>
> 		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> 	  </property>
> 	</configuration>
> {code}
>   The invocation of DistCp is as below:
> {code}
>     bash$ hadoop distcp -sourceClusterConf sourceCluster.xml /foo/bar hdfs://nn2:8020/bar/foo
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message