hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From david marion <dlmar...@hotmail.com>
Subject Client usage with multiple clusters
Date Thu, 17 Apr 2014 23:42:53 GMT
 I'm having an issue in client code where there are multiple clusters with HA namenodes involved.
Example setup using Hadoop 2.3.0:

Cluster A with the following properties defined in core, hdfs, etc:


Cluster B has similar properties defined in its core-site.xml, hdfs-site.xml, etc.

Now, I want to be able to distcp from clusterA to clusterB. Regardless of which cluster I
am executing this from, neither has all of the information. Looking at DFSClient and DataNode:

  - if I put both clusterA and clusterB into dfs.nameservices, then the datanodes will try
to federate the blocks from both nameservices.
  - if I don't put both clusterA and clusterB into dfs.nameservices, then the client won't
know how to resolve both namenodes for the nameservices in the distcp command.

 I'm wondering if I am missing a property or something that will allow me to define both nameservice
on both clusters and have the datanodes for the cluster *not* try and federate. Looking at
DataNode, it appears that it tries to connect to all namenodes defined and the first one that
sets the clusterid wins. It seems that there should be a dfs.datanode.clusterid property that
the datanode uses. This seems to line up with 'namenode -format -clusterid <cluster>'
command when you have multiple nameservices. Am I missing something in the configuration that
will allow me to do what I want? To get distcp to work I had to create a 3 set of configuration
files just for the client to use.
View raw message