hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: Circumventing Hadoop's data placement policy
Date Sat, 23 May 2009 23:31:59 GMT
As hack, you could tunnel NN traffic from GridFTP clients through a 
different machine (by changing fs.default.name). Alternately these 
clients could use a socks proxy.

The amount of traffic to NN is not much and tunneling should not affect 
performance.

Raghu.

Brian Bockelman wrote:
> Hey all,
> 
> Had a problem I wanted to ask advice on.  The Caltech site I work with 
> currently have a few GridFTP servers which are on the same physical 
> machines as the Hadoop datanodes, and a few that aren't.  The GridFTP 
> server has a libhdfs backend which writes incoming network data into HDFS.
> 
> They've found that the GridFTP servers which are co-located with HDFS 
> datanode have poor performance because data is incoming at a much faster 
> rate than the HDD can handle.  The standalone GridFTP servers, however, 
> push data out to multiple nodes at one, and can handle the incoming data 
> just fine (>200MB/s).
> 
> Is there any way to turn off the preference for the local node?  Can 
> anyone think of a good workaround to trick HDFS into thinking the client 
> isn't on the same node?
> 
> Brian


Mime
View raw message