hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haohui Mai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6376) Distcp data between two HA clusters requires another configuration
Date Mon, 25 Aug 2014 21:45:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109793#comment-14109793

Haohui Mai commented on HDFS-6376:

bq. My patch took the non-invasive approach as I am not familiar with the code base and all
of the affected components. I think a better long term implementation would be to specify
the clusters (cluster1, cluster2) and their associated nameservices and specify which cluster
is "this" cluster.

There are no fundamental differences between having an exclude and include lists for the clusters.
It is somewhat easier to predict what NNs that the DNs are going to report just based on the
configuration. I agree that having the ability to specify what a cluster is will simplify
the configuration.

bq. I tried a change in DFSUtil in my patch6 (see below). I had to back it out as it caused
problems. I have had to use -ns in the admin commands and am use to using it now. My point
here is that if you have a complex configuration, then you may need to be more specific in
the commands that you execute. I think its fair to force the user to specify the -ns argument.

Agree. I think it is fair to require the users to specify the nameservice in this complex

> Distcp data between two HA clusters requires another configuration
> ------------------------------------------------------------------
>                 Key: HDFS-6376
>                 URL: https://issues.apache.org/jira/browse/HDFS-6376
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, federation, hdfs-client
>    Affects Versions: 2.2.0, 2.3.0, 2.4.0
>         Environment: Hadoop 2.3.0
>            Reporter: Dave Marion
>            Assignee: Dave Marion
>             Fix For: 3.0.0
>         Attachments: HDFS-6376-2.patch, HDFS-6376-3-branch-2.4.patch, HDFS-6376-4-branch-2.4.patch,
HDFS-6376-5-trunk.patch, HDFS-6376-6-trunk.patch, HDFS-6376-7-trunk.patch, HDFS-6376-branch-2.4.patch,
HDFS-6376-patch-1.patch, HDFS-6376.000.patch, HDFS-6376.008.patch, HDFS-6376.009.patch
> User has to create a third set of configuration files for distcp when transferring data
between two HA clusters.
> Consider the scenario in [1]. You cannot put all of the required properties in core-site.xml
and hdfs-site.xml for the client to resolve the location of both active namenodes. If you
do, then the datanodes from cluster A may join cluster B. I can not find a configuration option
that tells the datanodes to federate blocks for only one of the clusters in the configuration.
> [1] http://mail-archives.apache.org/mod_mbox/hadoop-user/201404.mbox/%3CBAY172-W2133964E0C283968C161DD1520%40phx.gbl%3E

This message was sent by Atlassian JIRA

View raw message