hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanley Shi <s...@gopivotal.com>
Subject Re: Client usage with multiple clusters
Date Fri, 25 Apr 2014 06:04:58 GMT
My guess is to put two set of this
dfs.ha.namenodes.clusterA=nn1,nn2
dfs.namenode.rpc-address.clusterA.nn1=
dfs.namenode.http-address.clusterA.nn1=
dfs.namenode.rpc-address.clusterA.nn2=
dfs.namenode.http-address.clusterA.nn2=

to the client setting, and then access it like hdfs://clusterA/tmp ...

Regards,
*Stanley Shi,*



On Fri, Apr 18, 2014 at 7:42 AM, david marion <dlmarion@hotmail.com> wrote:

>  I'm having an issue in client code where there are multiple clusters with
> HA namenodes involved. Example setup using Hadoop 2.3.0:
>
> Cluster A with the following properties defined in core, hdfs, etc:
>
> dfs.nameservices=clusterA
> dfs.ha.namenodes.clusterA=nn1,nn2
> dfs.namenode.rpc-address.clusterA.nn1=
> dfs.namenode.http-address.clusterA.nn1=
> dfs.namenode.rpc-address.clusterA.nn2=
> dfs.namenode.http-address.clusterA.nn2=
>
> dfs.client.failover.proxy.provider.clusterA=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
>
> Cluster B has similar properties defined in its core-site.xml,
> hdfs-site.xml, etc.
>
> Now, I want to be able to distcp from clusterA to clusterB. Regardless of
> which cluster I am executing this from, neither has all of the information.
> Looking at DFSClient and DataNode:
>
>   - if I put both clusterA and clusterB into dfs.nameservices, then the
> datanodes will try to federate the blocks from both nameservices.
>   - if I don't put both clusterA and clusterB into dfs.nameservices, then
> the client won't know how to resolve both namenodes for the nameservices in
> the distcp command.
>
>  I'm wondering if I am missing a property or something that will allow me
> to define both nameservice on both clusters and have the datanodes for the
> cluster *not* try and federate. Looking at DataNode, it appears that it
> tries to connect to all namenodes defined and the first one that sets the
> clusterid wins. It seems that there should be a dfs.datanode.clusterid
> property that the datanode uses. This seems to line up with 'namenode
> -format -clusterid <cluster>' command when you have multiple nameservices.
> Am I missing something in the configuration that will allow me to do what I
> want? To get distcp to work I had to create a 3 set of configuration files
> just for the client to use.
>

Mime
View raw message