Doug Cutting wrote:
> Jim Kellerman wrote:
>> Yes, multiplexing a socket is more complicated than having one socket
>> per file, but saving system resources seems like a way to scale.
>>
>> Questions? Comments? Opinions? Flames?
>
> Note that Hadoop RPC already multiplexes, sharing a single socket per
> pair of JVMs. It would be possible to multiplex datanode, and should
> not in theory significantly impact performance, but, as you indicate,
> it would be a significant change. One approach might be to implement
> HDFS data access using RPC rather than directly using stream i/o.
>
> RPC also tears down idle connections, which HDFS does not. I wonder
> how much doing that alone might help your case? That would probably
> be much simpler to implement. Both client and server must already
> handle connection failures, so it shouldn't be too great of a change
> to have one or both sides actively close things down if they're idle
> for more than a few seconds. This is related to adding write timeouts
> to the datanode (HADOOP-2346).
Doug,
Dhruba and I had discussed using RPC in the past. While RPC is a
cleaner interface and our rpc implementation has
features such sharing connection, closing idle connections etc,
streaming IO lets to pipe large amounts
of data without the request/response exchange.
The worry was that IO performance would degrade.
BTW, NFS uses rpc (NFS does not have the write pipeline for replicas)
sanjay
>
> Doug
|