hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Radia <sra...@yahoo-inc.com>
Subject Re: Multiplexing sockets in DFSClient/datanodes?
Date Wed, 12 Mar 2008 18:35:49 GMT
Doug Cutting wrote:
> Jim Kellerman wrote:
>> Yes, multiplexing a socket is more complicated than having one socket
>> per file, but saving system resources seems like a way to scale.
>> Questions? Comments? Opinions? Flames?
> Note that Hadoop RPC already multiplexes, sharing a single socket per 
> pair of JVMs.  It would be possible to multiplex datanode, and should 
> not in theory significantly impact performance, but, as you indicate, 
> it would be a significant change.  One approach might be to implement 
> HDFS data access using RPC rather than directly using stream i/o.
> RPC also tears down idle connections, which HDFS does not.  I wonder 
> how much doing that alone might help your case?  That would probably 
> be much simpler to implement.  Both client and server must already 
> handle connection failures, so it shouldn't be too great of a change 
> to have one or both sides actively close things down if they're idle 
> for more than a few seconds.  This is related to adding write timeouts 
> to the datanode (HADOOP-2346).

   Dhruba and I had discussed using RPC in the past. While RPC is a 
cleaner interface and our rpc implementation has
features such sharing connection, closing idle connections etc,  
streaming IO lets to pipe large amounts
of data without the request/response exchange.
The worry was that IO performance would degrade.
BTW, NFS uses rpc (NFS does not have the write pipeline for replicas)

> Doug

View raw message