hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-9868) add reading source cluster with HA access mode feature for DistCp
Date Tue, 08 Nov 2016 02:20:58 GMT

     [ https://issues.apache.org/jira/browse/HDFS-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiao Chen updated HDFS-9868:
----------------------------
    Attachment: HDFS-9868.05.patch

I'm attaching a patch 5 to help move this forward, [~iceberg565] hope you don't mind. Thanks
again for the work so far. Feel free to let me know if you want to continue the work on this.

Here's what's in patch 5:
- rebased to latest trunk, mainly due to HDFS-9640 as [~jojochuang] pointed out.
- addressed comments above
- Various nitty modifications based from my review.

A more general comment I'm still trying to address is, 'source' here seems vague. It really
depends on where the {{distcp}} command is run. In the doc example, it actually looks more
like a 'destination' config. So I'm thinking to generalize it as 'remote' configuration. Additionally,
it seems we should provide a directory so both {{hdfs-site.xml}} and {{core-site.xml}} can
be read. Maybe there're also some MR/Yarn level changes, I'll test and see.

> add reading source cluster with HA access mode feature for DistCp
> -----------------------------------------------------------------
>
>                 Key: HDFS-9868
>                 URL: https://issues.apache.org/jira/browse/HDFS-9868
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: distcp
>    Affects Versions: 2.7.1
>            Reporter: NING DING
>            Assignee: NING DING
>         Attachments: HDFS-9868.05.patch, HDFS-9868.1.patch, HDFS-9868.2.patch, HDFS-9868.3.patch,
HDFS-9868.4.patch
>
>
> Normally the HDFS cluster is HA enabled. It could take a long time when coping huge data
by distp. If the source cluster changes active namenode, the distp will run failed. This patch
supports the DistCp can read source cluster files in HA access mode. A source cluster configuration
file needs to be specified (via the -sourceClusterConf option).
>   The following is an example of the contents of a source cluster configuration
>   file:
> {code:xml}
>     <configuration>
>       <property>
> 		<name>fs.defaultFS</name>
> 		<value>hdfs://mycluster</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.nameservices</name>
> 		<value>mycluster</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.ha.namenodes.mycluster</name>
> 		<value>nn1,nn2</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.namenode.rpc-address.mycluster.nn1</name>
> 		<value>host1:9000</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.namenode.rpc-address.mycluster.nn2</name>
> 		<value>host2:9000</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.namenode.http-address.mycluster.nn1</name>
> 		<value>host1:50070</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.namenode.http-address.mycluster.nn2</name>
> 		<value>host2:50070</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.client.failover.proxy.provider.mycluster</name>
> 		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> 	  </property>
> 	</configuration>
> {code}
>   The invocation of DistCp is as below:
> {code}
>     bash$ hadoop distcp -sourceClusterConf sourceCluster.xml /foo/bar hdfs://nn2:8020/bar/foo
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message