hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jinglong.liujl (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1325) DFSClient(DFSInputStream) release the persistent connection with datanode when no data have been read for a long time
Date Thu, 05 Aug 2010 02:13:17 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895514#action_12895514

jinglong.liujl commented on HDFS-1325:

>Yes. Following from this direction, we probably should limit the number of open files,
like the file descriptor limit in Unix.

Of course, close file is necessary,  but if a user don't close file or there's some bugs in
his application,  As a distribute system, we should keep service, right ? 

>In the patch, a new TimeoutChecker thread is started for each DFSInputStream. It is very
expensive. All clients, idle or not, have to pay for it.

Yes, New a thread is not very cheap for single machine, but I think it should see what's the
bottleneck in our system, If number of connections will bring machine down, a watch-dog thread(TimeoutChecker
) will save it. Absolutely, locate it into LeaseChecker or other thread is OK, but   it's
not very clear in code structure.

> DFSClient(DFSInputStream) release the persistent connection with datanode when no data
have been read for a long time
> ---------------------------------------------------------------------------------------------------------------------
>                 Key: HDFS-1325
>                 URL: https://issues.apache.org/jira/browse/HDFS-1325
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>            Reporter: jinglong.liujl
>             Fix For: 0.20.3
>         Attachments: dfsclient.patch, toomanyconnction.patch
> When you use Hbase over hadoop. We found during scanning over a large table ( which has
many regions and each region has many store files), there're too many connections has been
kept between regionserver (act as DFSClient) and datanode.  Even if the store file has been
complete to scanning, the connections can not be closed.
> In our cluster, too many extra connections cause too many system resource has been wasted,
which cause system cpu on region server reach to a high level, then bring this region server
> After investigating, we found the number of active connection is very small, and the
most connection is idle. We add a timeout checker thread into DFSClient, to close this connection.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message