hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stu Hood <stuh...@webmail.us>
Subject Re: Hadoop behind a Firewall
Date Tue, 11 Sep 2007 22:06:39 GMT
We would definitely limit the IP ranges that were allowed to connect via the external IP to
prevent complete access: the clients in this case would be in other data centers with known

I'm less concerned with being able to submit jobs remotely as I am to be able to access the
DFS remotely. The plan was to have other data centers act as Hadoop clients and push new files.
Perhaps I should look for a solution that puts all of the HClients inside the firewall?


-----Original Message-----
From: Ted Dunning 
Sent: Tuesday, September 11, 2007 5:52pm
To: hadoop-user@lucene.apache.org
Subject: Re: Hadoop behind a Firewall

If the only purpose of the clients is to launch map-reduce jobs you may be
able to get away with some DNS evil to limit the number of external IP's.
You can use the diagnostic HTTP interfaces as well to see data with limited
access.  Other than such severely limited operation, you will be hard
pressed because the whole point of HDFS is that the client communicate
directly with the datanode when reading or writing.

Wat is the rationale for this firewall arrangement?  Since HDFS has no
permissions, any access is about the same as complete access.

On 9/11/07 2:40 PM, "Stu Hood"  wrote:

> Hey gang,
> We're getting ready to deploy our first cluster, and while deciding on the
> node layout, we ran into an interesting question.
> The cluster will be behind a firewall, and a few clients will be on the
> outside. We'd like to minimize the number of external IPs we use, and provide
> a single IP address with forwarded ports for each node (using iptables).
> We've used this method before with simpler "client -> server" protocols, but
> because of Hadoop's "client -> namenode -> client -> datanode" protocol, I'm
> assuming this will not work by default.
> Is it possible to configure the namenode to send clients a different external
> IP/port for the datanodes than the one it uses when it communicates directly?
> Thanks a lot!
> Stu Hood
> Webmail.us
> "You manage your business. We'll manage your email."®

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message