hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-374) HDFS needs to support a very large number of open files.
Date Mon, 21 Jul 2014 19:46:42 GMT

    [ https://issues.apache.org/jira/browse/HDFS-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069139#comment-14069139

Colin Patrick McCabe commented on HDFS-374:

oh, and also, using short-circuit reads mitigates this somewhat as well.  We can share the
same file descriptor across multiple instances of a short-circuit block file being opened,
as well.

> HDFS needs to support a very large number of open files.
> --------------------------------------------------------
>                 Key: HDFS-374
>                 URL: https://issues.apache.org/jira/browse/HDFS-374
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Jim Kellerman
> Currently, DFSClient maintains one socket per open file. For most map/reduce operations,
this is not a problem because there just aren't many open files.
> However, HBase has a very different usage model in which a single region region server
could have thousands (10**3 but less than 10**4) open files. This can cause both datanodes
and region servers to run out of file handles.
> What I would like to see is one connection for each dfsClient, datanode pair. This would
reduce the number of connections to hundreds or tens of sockets.
> The intent is not to process requests totally asychronously (overlapping block reads
and forcing the client to reassemble a whole message out of a bunch of fragments), but rather
to queue requests from the client to the datanode and process them serially, differing from
the current implementation in that rather than use an exclusive socket for each file, only
one socket is in use between the client and a particular datanode.

This message was sent by Atlassian JIRA

View raw message