Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 085D711FFD for ; Sat, 23 Aug 2014 21:30:12 +0000 (UTC) Received: (qmail 46411 invoked by uid 500); 23 Aug 2014 21:30:12 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 46359 invoked by uid 500); 23 Aug 2014 21:30:12 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 46344 invoked by uid 99); 23 Aug 2014 21:30:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Aug 2014 21:30:12 +0000 Date: Sat, 23 Aug 2014 21:30:11 +0000 (UTC) From: "Dave Marion (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-6376) Distcp data between two HA clusters requires another configuration MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108155#comment-14108155 ] Dave Marion commented on HDFS-6376: ----------------------------------- bq. I think it might make more sense to explicitly specify the name service that the DNs should report to. Since the changes are trivial, I'll provide another patch. I agree. I think there is a deficiency in the configuration properties. With federation, an HDFS cluster is a set of nameservices (ns1, ns2, ns3, ns4). However, I don't think you can define an alias for the overall cluster such that cluster1 contains ns1 and ns2, and cluster2 contains n3 and ns4. In hdfs-site.xml, all of the nameservices are listed and the DN tries to connect to all of them, with the downside that the first one that responds to the DN assigns the cluster id. My patch took the non-invasive approach as I am not familiar with the code base and all of the affected components. I think a better long term implementation would be to specify the clusters (cluster1, cluster2) and their associated nameservices and specify which cluster is "this" cluster. > Distcp data between two HA clusters requires another configuration > ------------------------------------------------------------------ > > Key: HDFS-6376 > URL: https://issues.apache.org/jira/browse/HDFS-6376 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, federation, hdfs-client > Affects Versions: 2.2.0, 2.3.0, 2.4.0 > Environment: Hadoop 2.3.0 > Reporter: Dave Marion > Assignee: Dave Marion > Fix For: 3.0.0 > > Attachments: HDFS-6376-2.patch, HDFS-6376-3-branch-2.4.patch, HDFS-6376-4-branch-2.4.patch, HDFS-6376-5-trunk.patch, HDFS-6376-6-trunk.patch, HDFS-6376-7-trunk.patch, HDFS-6376-branch-2.4.patch, HDFS-6376-patch-1.patch, HDFS-6376.000.patch, HDFS-6376.008.patch > > > User has to create a third set of configuration files for distcp when transferring data between two HA clusters. > Consider the scenario in [1]. You cannot put all of the required properties in core-site.xml and hdfs-site.xml for the client to resolve the location of both active namenodes. If you do, then the datanodes from cluster A may join cluster B. I can not find a configuration option that tells the datanodes to federate blocks for only one of the clusters in the configuration. > [1] http://mail-archives.apache.org/mod_mbox/hadoop-user/201404.mbox/%3CBAY172-W2133964E0C283968C161DD1520%40phx.gbl%3E -- This message was sent by Atlassian JIRA (v6.2#6252)