hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Marion (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6376) Distcp data between two HA clusters requires another configuration
Date Sat, 23 Aug 2014 21:30:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108155#comment-14108155
] 

Dave Marion commented on HDFS-6376:
-----------------------------------

bq. I think it might make more sense to explicitly specify the name service that the DNs should
report to. Since the changes are trivial, I'll provide another patch.

I agree. I think there is a deficiency in the configuration properties. With federation, an
HDFS cluster is a set of nameservices (ns1, ns2, ns3, ns4). However, I don't think you can
define an alias for the overall cluster such that cluster1 contains ns1 and ns2, and cluster2
contains n3 and ns4. In hdfs-site.xml, all of the nameservices are listed and the DN tries
to connect to all of them, with the downside that the first one that responds to the DN assigns
the cluster id. My patch took the non-invasive approach as I am not familiar with the code
base and all of the affected components. I think a better long term implementation would be
to specify the clusters (cluster1, cluster2) and their associated nameservices and specify
which cluster is "this" cluster.

> Distcp data between two HA clusters requires another configuration
> ------------------------------------------------------------------
>
>                 Key: HDFS-6376
>                 URL: https://issues.apache.org/jira/browse/HDFS-6376
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, federation, hdfs-client
>    Affects Versions: 2.2.0, 2.3.0, 2.4.0
>         Environment: Hadoop 2.3.0
>            Reporter: Dave Marion
>            Assignee: Dave Marion
>             Fix For: 3.0.0
>
>         Attachments: HDFS-6376-2.patch, HDFS-6376-3-branch-2.4.patch, HDFS-6376-4-branch-2.4.patch,
HDFS-6376-5-trunk.patch, HDFS-6376-6-trunk.patch, HDFS-6376-7-trunk.patch, HDFS-6376-branch-2.4.patch,
HDFS-6376-patch-1.patch, HDFS-6376.000.patch, HDFS-6376.008.patch
>
>
> User has to create a third set of configuration files for distcp when transferring data
between two HA clusters.
> Consider the scenario in [1]. You cannot put all of the required properties in core-site.xml
and hdfs-site.xml for the client to resolve the location of both active namenodes. If you
do, then the datanodes from cluster A may join cluster B. I can not find a configuration option
that tells the datanodes to federate blocks for only one of the clusters in the configuration.
> [1] http://mail-archives.apache.org/mod_mbox/hadoop-user/201404.mbox/%3CBAY172-W2133964E0C283968C161DD1520%40phx.gbl%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message