hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kiss Tibor <kiss.ti...@gmail.com>
Subject Re: Using distcp from HDFS running on Amazon
Date Tue, 16 Nov 2010 22:45:10 GMT
Hi Bob,

If I create a Hadoop cluster on Amazon instances, I can access the hdfs from
my laptop (or in your case a different cluster) by using a socks proxy.
Myself I am using whirr based setup, which does create a hadoop-proxy.sh
which creates this socks proxy for the client api (and also for web access).

Regarding your private IP address, Initially I also though that the problem
is about security group configures inter-node communication only on private
ip addresses. The problem it was not about that security group
configuration, but the fact that it was configured with IP address at all.
The namenode has to be configured with public dns name, because this name is
resolvable from outside the Amazon into a public IP address, and from inside
Amazon into a private ip address. I my case (with whirr cluster setup) it
makes impossible to correctly run the cluster itself. So I made a fix
https://issues.apache.org/jira/browse/WHIRR-128
After this applying this fix, I am able to run cluster started by whirr and
I can access the hdfs in a secure mode even from the outside of Amazon.

Cheers,
Tibor


On Tue, Nov 16, 2010 at 6:39 PM, Robert Goodman <bsgbmg@gmail.com> wrote:

> I have a Amazon cluster which is using HDFS (not S3). Is it possible to use
> distcp to copy file from a HDFS running on Amazon to another cluster? The
> other cluster is not running on Amazon. It doesn't look like this is
> possible because the namenode gets configured with a private IP address
> which is not accessible from outside the cluster. Does anybody know a way
> around the problem?
>
>    Thanks
>     Bob
>
>

Mime
View raw message