hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Marion (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6376) Distcp data between two HA clusters requires another configuration
Date Thu, 21 Aug 2014 16:59:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105586#comment-14105586

Dave Marion commented on HDFS-6376:

Thanks for reviewing this. I thought this was dead and that I would forever have to patch
Hadoop for our application. Out of curiosity, have others started running into this issue?

bq. have you tested your patch for distcp between real clusters?

Yes. I have been running a version of this patch for about 2 months on a test cluster. We
are using Hadoop 2 so the patch that I am applying is a little different. My Hadoop 2 patch
also includes a change in the dfsclusterhealth.jsp file so that only NameNodes in "this" cluster
are shown. I could not find the same jsp file in the Hadoop 3 source. Generally speaking,
I think I have fixed all locations in the code that need to be fixed, but I could be missing
something that I don't know about. As you can see from the patch history, I thought I had
to make a change DFSUtil, but it broke some things and I had to revert those changes.

bq. It will be great if you can generally mention how you patch works for both secured and
insecure HA clusters.

We are not using secured HA, it has not been tested in that manner

bq.  Another nit is that we need to fix indents in the new unit test.

I'm happy to fix.

bq. Maybe we can rename the new configuration from "dfs.nameservice.cluster.excludes" to something
like "dfs.nameservices.cluster.outside"

I have no issues with changing the name. In my situation I have multiple HDFS nameservices
defined in hdfs-site.xml and I want to explicitly state which ones are not part of "this"
cluster. Exclude seemed like a good term for that.

> Distcp data between two HA clusters requires another configuration
> ------------------------------------------------------------------
>                 Key: HDFS-6376
>                 URL: https://issues.apache.org/jira/browse/HDFS-6376
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, federation, hdfs-client
>    Affects Versions: 2.3.0, 2.4.0
>         Environment: Hadoop 2.3.0
>            Reporter: Dave Marion
>            Assignee: Dave Marion
>             Fix For: 3.0.0
>         Attachments: HDFS-6376-2.patch, HDFS-6376-3-branch-2.4.patch, HDFS-6376-4-branch-2.4.patch,
HDFS-6376-5-trunk.patch, HDFS-6376-6-trunk.patch, HDFS-6376-7-trunk.patch, HDFS-6376-branch-2.4.patch,
> User has to create a third set of configuration files for distcp when transferring data
between two HA clusters.
> Consider the scenario in [1]. You cannot put all of the required properties in core-site.xml
and hdfs-site.xml for the client to resolve the location of both active namenodes. If you
do, then the datanodes from cluster A may join cluster B. I can not find a configuration option
that tells the datanodes to federate blocks for only one of the clusters in the configuration.
> [1] http://mail-archives.apache.org/mod_mbox/hadoop-user/201404.mbox/%3CBAY172-W2133964E0C283968C161DD1520%40phx.gbl%3E

This message was sent by Atlassian JIRA

View raw message