hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9868) Add ability for DistCp to run between 2 clusters
Date Fri, 24 Feb 2017 06:55:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882089#comment-15882089

Yongjun Zhang commented on HDFS-9868:

Thanks for the new rev that implements confMap [~xiaochen], below are my comments:

# separate the /log4j.properties change to a new jira
# loading confMap file should be done once at initialization, and then the map is used when
needed. Currently it's open one time for each file, which should really be avoided, especially
for many-file cases, this would certainly impact performance.
# should not create a conf object for each file, it should be a ref to a common conf object
created in item 2 above.
# FileSystemAccessService has a better implementation of loading conf file.
# suggest to add a method in Path class to get back the cluster path, instead of creating
it at client code.

Would you please address these and I will review further after that.


> Add ability for DistCp to run between 2 clusters
> ------------------------------------------------
>                 Key: HDFS-9868
>                 URL: https://issues.apache.org/jira/browse/HDFS-9868
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: distcp
>    Affects Versions: 2.7.1
>            Reporter: NING DING
>            Assignee: NING DING
>         Attachments: HDFS-9868.05.patch, HDFS-9868.06.patch, HDFS-9868.07.patch, HDFS-9868.08.patch,
HDFS-9868.1.patch, HDFS-9868.2.patch, HDFS-9868.3.patch, HDFS-9868.4.patch
> Normally the HDFS cluster is HA enabled. It could take a long time when coping huge data
by distp. If the source cluster changes active namenode, the distp will run failed. This patch
supports the DistCp can read source cluster files in HA access mode. A source cluster configuration
file needs to be specified (via the -sourceClusterConf option).
>   The following is an example of the contents of a source cluster configuration
>   file:
> {code:xml}
>     <configuration>
>       <property>
> 		<name>fs.defaultFS</name>
> 		<value>hdfs://mycluster</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.nameservices</name>
> 		<value>mycluster</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.ha.namenodes.mycluster</name>
> 		<value>nn1,nn2</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.namenode.rpc-address.mycluster.nn1</name>
> 		<value>host1:9000</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.namenode.rpc-address.mycluster.nn2</name>
> 		<value>host2:9000</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.namenode.http-address.mycluster.nn1</name>
> 		<value>host1:50070</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.namenode.http-address.mycluster.nn2</name>
> 		<value>host2:50070</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.client.failover.proxy.provider.mycluster</name>
> 		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> 	  </property>
> 	</configuration>
> {code}
>   The invocation of DistCp is as below:
> {code}
>     bash$ hadoop distcp -sourceClusterConf sourceCluster.xml /foo/bar hdfs://nn2:8020/bar/foo
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message