Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id F3751200C29 for ; Tue, 28 Feb 2017 17:12:54 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id F1DAC160B7C; Tue, 28 Feb 2017 16:12:54 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 43E6C160B6A for ; Tue, 28 Feb 2017 17:12:54 +0100 (CET) Received: (qmail 19651 invoked by uid 500); 28 Feb 2017 16:12:53 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 19498 invoked by uid 99); 28 Feb 2017 16:12:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Feb 2017 16:12:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 7C9331A0471 for ; Tue, 28 Feb 2017 16:12:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.347 X-Spam-Level: X-Spam-Status: No, score=-2.347 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-2.999, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id K6gHGSs5BK2Z for ; Tue, 28 Feb 2017 16:12:51 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 1D8295FBE4 for ; Tue, 28 Feb 2017 16:12:49 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 4E83EE088A for ; Tue, 28 Feb 2017 16:12:46 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 71FE624166 for ; Tue, 28 Feb 2017 16:12:45 +0000 (UTC) Date: Tue, 28 Feb 2017 16:12:45 +0000 (UTC) From: "Yongjun Zhang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-9868) Add ability for DistCp to run between 2 clusters MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 28 Feb 2017 16:12:55 -0000 [ https://issues.apache.org/jira/browse/HDFS-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888341#comment-15888341 ] Yongjun Zhang commented on HDFS-9868: ------------------------------------- Here is my proposed approach to handle confMap with addCachedArchive: distcp -confMapDir is a local dir at the host where distcp is to be run. It contains {code} /confMapping //*xml //*xml //*xml ...... {code} content of /confMapping: {code} hdfs://x.y.z:8020 hdfs// webhdfs://a.b.c:50070 ...... {code} and is a dir that hold needed conf files for cluster1 (hdfs://x.y.z:8020), is similar dir for cluster2 (hdfs//), and so on. Distcp creates a tar file of dir as .tar, then call job.addcachedArchive(new URI(".tar")); CopyMapper/CopyCommitter would access the files in distributed cache as {code} ./.tar/confMapping ./.tar//*xml ./.tar//*xml ./.tar//*xml ...... {code} and call Resource.addResource(Path path) to add the conf files. > Add ability for DistCp to run between 2 clusters > ------------------------------------------------ > > Key: HDFS-9868 > URL: https://issues.apache.org/jira/browse/HDFS-9868 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp > Affects Versions: 2.7.1 > Reporter: NING DING > Assignee: NING DING > Attachments: HDFS-9868.05.patch, HDFS-9868.06.patch, HDFS-9868.07.patch, HDFS-9868.08.patch, HDFS-9868.09.patch, HDFS-9868.10.patch, HDFS-9868.1.patch, HDFS-9868.2.patch, HDFS-9868.3.patch, HDFS-9868.4.patch > > > Normally the HDFS cluster is HA enabled. It could take a long time when coping huge data by distp. If the source cluster changes active namenode, the distp will run failed. This patch supports the DistCp can read source cluster files in HA access mode. A source cluster configuration file needs to be specified (via the -sourceClusterConf option). > The following is an example of the contents of a source cluster configuration > file: > {code:xml} > > > fs.defaultFS > hdfs://mycluster > > > dfs.nameservices > mycluster > > > dfs.ha.namenodes.mycluster > nn1,nn2 > > > dfs.namenode.rpc-address.mycluster.nn1 > host1:9000 > > > dfs.namenode.rpc-address.mycluster.nn2 > host2:9000 > > > dfs.namenode.http-address.mycluster.nn1 > host1:50070 > > > dfs.namenode.http-address.mycluster.nn2 > host2:50070 > > > dfs.client.failover.proxy.provider.mycluster > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > > > {code} > The invocation of DistCp is as below: > {code} > bash$ hadoop distcp -sourceClusterConf sourceCluster.xml /foo/bar hdfs://nn2:8020/bar/foo > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org