hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: Multiplexing sockets in DFSClient/datanodes?
Date Sun, 16 Mar 2008 05:30:22 GMT

There are many resources consumed by an open dfs file : fds, sockets, 
socket buffers, threads ... etc.

Better questions to consider might be "How do we support very large 
number of open files in HDFS"?, which, I think opens it up to different 
types of solutions than one. And "what compromises (if there are any) 
are ok to achieve this?".

I know its a serious problem for HBase and every fix, incremental or 
not, helps. Having a short write write timeout on DataNode in current 
trunk will help greatly with DataNode side (threads and sockets). Of 
course, we need to make write timeout configurable, which is trivial.

One connection between every client and datanode might not as scalable 
on a large cluster. Say the cluster has 3000 datanodes and a client has 
5000 files open to essentially to random datanodes. Then the number of 
connections from client is still in thousands (same problem as now).

Raghu.

dhruba Borthakur wrote:
> Hi Jim,
> 
> Oh, I see. This does not sound too difficult. One can use the connection
> pooling code from the RPC layer. The DFS Client can use the pool to
> cache open connections.  Also, I assumed that this connection pooling is
> enabled only for block reads and not for block writes.
> 
> Would you like to open a JIRA so that we can discuss it in more detail?
> 
> Thanks,
> dhruba
> 
> -----Original Message-----
> From: Jim Kellerman [mailto:jim@powerset.com] 
> Sent: Friday, March 14, 2008 1:01 PM
> To: core-dev@hadoop.apache.org; hadoop-dev@lucene.apache.org
> Subject: RE: Multiplexing sockets in DFSClient/datanodes?
> 
> I'm not suggesting doing simultaneous transfers, just having one
> connection between any one client and any one data node. My thinking was
> each transfer would be queued and then processed one at a time.
> 
> This is a big problem for us. On our cluster at Powerset, we have had
> both datanodes and HBase region servers run out of file handles because
> there is one open per file.
> 
> As HBase installations get larger one socket per file just won't scale.
> 
> ---
> Jim Kellerman, Senior Engineer; Powerset
> 
> 
>> -----Original Message-----
>> From: dhruba Borthakur [mailto:dhruba@yahoo-inc.com]
>> Sent: Friday, March 14, 2008 10:53 AM
>> To: core-dev@hadoop.apache.org; hadoop-dev@lucene.apache.org
>> Subject: RE: Multiplexing sockets in DFSClient/datanodes?
>>
>> Hi Jim,
>>
>> The protocol between the client and the Datanodes will become
>> relatively more complex if we decide to multiplex
>> simultaneous transfers of multiple blocks on the same socket
>> connection. Do you think that the benefit of saving on system
>> resources is really appreciable?
>>
>> Thanks,
>> Dhruba
>>
>> -----Original Message-----
>> From: Sanjay Radia [mailto:sradia@yahoo-inc.com]
>> Sent: Wednesday, March 12, 2008 11:36 AM
>> To: hadoop-dev@lucene.apache.org
>> Subject: Re: Multiplexing sockets in DFSClient/datanodes?
>>
>> Doug Cutting wrote:
>>> Jim Kellerman wrote:
>>>> Yes, multiplexing a socket is more complicated than having
>> one socket
>>>> per file, but saving system resources seems like a way to scale.
>>>>
>>>> Questions? Comments? Opinions? Flames?
>>> Note that Hadoop RPC already multiplexes, sharing a single
>> socket per
>>> pair of JVMs.  It would be possible to multiplex datanode,
>> and should
>>> not in theory significantly impact performance, but, as you
>> indicate,
>>> it would be a significant change.  One approach might be to
>> implement
>>> HDFS data access using RPC rather than directly using stream i/o.
>>>
>>> RPC also tears down idle connections, which HDFS does not.
>> I wonder
>>> how much doing that alone might help your case?  That would
>> probably
>>> be much simpler to implement.  Both client and server must already
>>> handle connection failures, so it shouldn't be too great of
>> a change
>>> to have one or both sides actively close things down if
>> they're idle
>>> for more than a few seconds.  This is related to adding
>> write timeouts
>>
>>> to the datanode (HADOOP-2346).
>> Doug,
>>    Dhruba and I had discussed using RPC in the past. While
>> RPC is a cleaner interface and our rpc implementation has
>> features such sharing connection, closing idle connections
>> etc, streaming IO lets to pipe large amounts of data without
>> the request/response exchange.
>> The worry was that IO performance would degrade.
>> BTW, NFS uses rpc (NFS does not have the write pipeline for replicas)
>>
>> sanjay
>>> Doug
>>
>> No virus found in this incoming message.
>> Checked by AVG.
>> Version: 7.5.519 / Virus Database: 269.21.7/1329 - Release
>> Date: 3/14/2008 12:33 PM
>>
>>
> 
> No virus found in this outgoing message.
> Checked by AVG.
> Version: 7.5.519 / Virus Database: 269.21.7/1329 - Release Date:
> 3/14/2008 12:33 PM
> 


Mime
View raw message