hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9868) Add ability for DistCp to run between 2 clusters
Date Sat, 25 Feb 2017 22:50:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15884424#comment-15884424
] 

Yongjun Zhang commented on HDFS-9868:
-------------------------------------

Thanks for the new rev [~xiaochen]. Below are my comments on rev9:

1. Replace "path.toUri().getScheme() + "://" + path.toUri().getAuthority()" with a method
in Path class.
Ths is what I meant by my prior comment #5.
2. Better check whether t is null below
{code}
      final Throwable t = e.getCause();
        if (t instanceof UnknownHostException) {
          return DistCpConstants.UNKNOWN_HOST;
        }
{code}
3. Fix comment "* Setup source cluster configuration on the job configuration.", now the conf
is for all
4. Instead of using distributed cache, suggest to use the same location as where sequence
file is stored, to store the map file
5. Create a dedicated method for the code below (also apply comment #2 above)
{code}
      String uri =
            source.toUri().getScheme() + "://" + source.toUri().getAuthority();
        Configuration sourceConf = confMap.get(uri);
        if (sourceConf != null) {
          LOG.trace("Overriding configuration from confMap for path: {}",
              source.toString());
          final Path proxy = new Path(source.toUri()) {
            @Override
            public FileSystem getFileSystem(Configuration conf)
                throws IOException {
              return FileSystem.get(this.toUri(), sourceConf);
            }
          };
{code}
6. In CopyMapper, the confMap is set but not used. We should apply it when getting the source
and target file system.




> Add ability for DistCp to run between 2 clusters
> ------------------------------------------------
>
>                 Key: HDFS-9868
>                 URL: https://issues.apache.org/jira/browse/HDFS-9868
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: distcp
>    Affects Versions: 2.7.1
>            Reporter: NING DING
>            Assignee: NING DING
>         Attachments: HDFS-9868.05.patch, HDFS-9868.06.patch, HDFS-9868.07.patch, HDFS-9868.08.patch,
HDFS-9868.09.patch, HDFS-9868.1.patch, HDFS-9868.2.patch, HDFS-9868.3.patch, HDFS-9868.4.patch
>
>
> Normally the HDFS cluster is HA enabled. It could take a long time when coping huge data
by distp. If the source cluster changes active namenode, the distp will run failed. This patch
supports the DistCp can read source cluster files in HA access mode. A source cluster configuration
file needs to be specified (via the -sourceClusterConf option).
>   The following is an example of the contents of a source cluster configuration
>   file:
> {code:xml}
>     <configuration>
>       <property>
> 		<name>fs.defaultFS</name>
> 		<value>hdfs://mycluster</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.nameservices</name>
> 		<value>mycluster</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.ha.namenodes.mycluster</name>
> 		<value>nn1,nn2</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.namenode.rpc-address.mycluster.nn1</name>
> 		<value>host1:9000</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.namenode.rpc-address.mycluster.nn2</name>
> 		<value>host2:9000</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.namenode.http-address.mycluster.nn1</name>
> 		<value>host1:50070</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.namenode.http-address.mycluster.nn2</name>
> 		<value>host2:50070</value>
> 	  </property>
> 	  <property>
> 		<name>dfs.client.failover.proxy.provider.mycluster</name>
> 		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> 	  </property>
> 	</configuration>
> {code}
>   The invocation of DistCp is as below:
> {code}
>     bash$ hadoop distcp -sourceClusterConf sourceCluster.xml /foo/bar hdfs://nn2:8020/bar/foo
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message