hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Bieniosek <mich...@powerset.com>
Subject Re: Hadoop behind a Firewall
Date Tue, 11 Sep 2007 22:22:32 GMT
While you can proxy puts/gets to HDFS, this can dramatically decrease your
bandwidth.  The hadoop dfs client is pretty good about writing to/reading
from multiple HDFS nodes simultaneously; a proxy makes this impossible.

Of course, depending on your cluster size, network connection, and data
size, you may not care.

-Michael

On 9/11/07 3:15 PM, "Ted Dunning" <tdunning@veoh.com> wrote:

> 
> It is pretty easy to have a proxy of some kind accept files to be put into
> HDFS.  Make sure that the proxy doesn't preferentially write to itself.  The
> easiest way to avoid that is to either have the proxy outside the HDFS.
> 
> 
> On 9/11/07 3:06 PM, "Stu Hood" <stuhood@webmail.us> wrote:
> 
>> We would definitely limit the IP ranges that were allowed to connect via the
>> external IP to prevent complete access: the clients in this case would be in
>> other data centers with known addresses.
>> 
>> I'm less concerned with being able to submit jobs remotely as I am to be able
>> to access the DFS remotely. The plan was to have other data centers act as
>> Hadoop clients and push new files. Perhaps I should look for a solution that
>> puts all of the HClients inside the firewall?
>> 
>> Thanks,
>> Stu
>> 
>> 
>> 
>> -----Original Message-----
>> From: Ted Dunning
>> Sent: Tuesday, September 11, 2007 5:52pm
>> To: hadoop-user@lucene.apache.org
>> Subject: Re: Hadoop behind a Firewall
>> 
>> 
>> 
>> If the only purpose of the clients is to launch map-reduce jobs you may be
>> able to get away with some DNS evil to limit the number of external IP's.
>> You can use the diagnostic HTTP interfaces as well to see data with limited
>> access.  Other than such severely limited operation, you will be hard
>> pressed because the whole point of HDFS is that the client communicate
>> directly with the datanode when reading or writing.
>> 
>> Wat is the rationale for this firewall arrangement?  Since HDFS has no
>> permissions, any access is about the same as complete access.
>> 
>> 
>> On 9/11/07 2:40 PM, "Stu Hood"  wrote:
>> 
>>> Hey gang,
>>> 
>>> We're getting ready to deploy our first cluster, and while deciding on the
>>> node layout, we ran into an interesting question.
>>> 
>>> The cluster will be behind a firewall, and a few clients will be on the
>>> outside. We'd like to minimize the number of external IPs we use, and
>>> provide
>>> a single IP address with forwarded ports for each node (using iptables).
>>> 
>>> We've used this method before with simpler "client -> server" protocols, but
>>> because of Hadoop's "client -> namenode -> client -> datanode" protocol,
I'm
>>> assuming this will not work by default.
>>> 
>>> Is it possible to configure the namenode to send clients a different
>>> external
>>> IP/port for the datanodes than the one it uses when it communicates
>>> directly?
>>> 
>>> Thanks a lot!
>>> 
>>> Stu Hood
>>> 
>>> Webmail.us
>>> 
>>> "You manage your business. We'll manage your email."®
>> 
> 


Mime
View raw message