Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 89F36200BC5 for ; Tue, 8 Nov 2016 03:21:05 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 8880E160AF9; Tue, 8 Nov 2016 02:21:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CE49D160AEC for ; Tue, 8 Nov 2016 03:21:04 +0100 (CET) Received: (qmail 21237 invoked by uid 500); 8 Nov 2016 02:20:59 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 20896 invoked by uid 99); 8 Nov 2016 02:20:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Nov 2016 02:20:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8003F2C2A69 for ; Tue, 8 Nov 2016 02:20:58 +0000 (UTC) Date: Tue, 8 Nov 2016 02:20:58 +0000 (UTC) From: "Xiao Chen (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-9868) add reading source cluster with HA access mode feature for DistCp MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 08 Nov 2016 02:21:05 -0000 [ https://issues.apache.org/jira/browse/HDFS-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-9868: ---------------------------- Attachment: HDFS-9868.05.patch I'm attaching a patch 5 to help move this forward, [~iceberg565] hope you don't mind. Thanks again for the work so far. Feel free to let me know if you want to continue the work on this. Here's what's in patch 5: - rebased to latest trunk, mainly due to HDFS-9640 as [~jojochuang] pointed out. - addressed comments above - Various nitty modifications based from my review. A more general comment I'm still trying to address is, 'source' here seems vague. It really depends on where the {{distcp}} command is run. In the doc example, it actually looks more like a 'destination' config. So I'm thinking to generalize it as 'remote' configuration. Additionally, it seems we should provide a directory so both {{hdfs-site.xml}} and {{core-site.xml}} can be read. Maybe there're also some MR/Yarn level changes, I'll test and see. > add reading source cluster with HA access mode feature for DistCp > ----------------------------------------------------------------- > > Key: HDFS-9868 > URL: https://issues.apache.org/jira/browse/HDFS-9868 > Project: Hadoop HDFS > Issue Type: New Feature > Components: distcp > Affects Versions: 2.7.1 > Reporter: NING DING > Assignee: NING DING > Attachments: HDFS-9868.05.patch, HDFS-9868.1.patch, HDFS-9868.2.patch, HDFS-9868.3.patch, HDFS-9868.4.patch > > > Normally the HDFS cluster is HA enabled. It could take a long time when coping huge data by distp. If the source cluster changes active namenode, the distp will run failed. This patch supports the DistCp can read source cluster files in HA access mode. A source cluster configuration file needs to be specified (via the -sourceClusterConf option). > The following is an example of the contents of a source cluster configuration > file: > {code:xml} > > > fs.defaultFS > hdfs://mycluster > > > dfs.nameservices > mycluster > > > dfs.ha.namenodes.mycluster > nn1,nn2 > > > dfs.namenode.rpc-address.mycluster.nn1 > host1:9000 > > > dfs.namenode.rpc-address.mycluster.nn2 > host2:9000 > > > dfs.namenode.http-address.mycluster.nn1 > host1:50070 > > > dfs.namenode.http-address.mycluster.nn2 > host2:50070 > > > dfs.client.failover.proxy.provider.mycluster > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > > > {code} > The invocation of DistCp is as below: > {code} > bash$ hadoop distcp -sourceClusterConf sourceCluster.xml /foo/bar hdfs://nn2:8020/bar/foo > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org