hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hairong Kuang <hair...@yahoo-inc.com>
Subject Re: Multiplexing sockets in DFSClient/datanodes?
Date Wed, 12 Mar 2008 19:17:57 GMT
> streaming IO lets to pipe large amounts
> of data without the request/response exchange.
> The worry was that IO performance would degrade.

Since hadoop-2188 removes ipc timeout, it is ok that a datanode responses a
datanode up in the pipeline when it gets a response from a datanode down in
the pipeline. If datanodes could have two threads, one pushing data down to
the pipeline and one writing it to the local disk, using RPC won't introduce
any additional communication cost.

Hairong

On 3/12/08 11:35 AM, "Sanjay Radia" <sradia@yahoo-inc.com> wrote:

> Doug Cutting wrote:
>> Jim Kellerman wrote:
>>> Yes, multiplexing a socket is more complicated than having one socket
>>> per file, but saving system resources seems like a way to scale.
>>> 
>>> Questions? Comments? Opinions? Flames?
>> 
>> Note that Hadoop RPC already multiplexes, sharing a single socket per
>> pair of JVMs.  It would be possible to multiplex datanode, and should
>> not in theory significantly impact performance, but, as you indicate,
>> it would be a significant change.  One approach might be to implement
>> HDFS data access using RPC rather than directly using stream i/o.
>> 
>> RPC also tears down idle connections, which HDFS does not.  I wonder
>> how much doing that alone might help your case?  That would probably
>> be much simpler to implement.  Both client and server must already
>> handle connection failures, so it shouldn't be too great of a change
>> to have one or both sides actively close things down if they're idle
>> for more than a few seconds.  This is related to adding write timeouts
>> to the datanode (HADOOP-2346).
> 
> Doug,
>    Dhruba and I had discussed using RPC in the past. While RPC is a
> cleaner interface and our rpc implementation has
> features such sharing connection, closing idle connections etc,
> streaming IO lets to pipe large amounts
> of data without the request/response exchange.
> The worry was that IO performance would degrade.
> BTW, NFS uses rpc (NFS does not have the write pipeline for replicas)
> 
> sanjay
>> 
>> Doug
> 


Mime
View raw message