hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
Date Wed, 09 Mar 2016 19:38:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187755#comment-15187755

Colin Patrick McCabe commented on HDFS-9924:

Currently the NameNode can handle between 10k and 100k operations per second, depending on
configuration and the nature of the operations.  It seems like you should be able to comfortably
dispatch that many operations from a few thousand client threads performing synchronous RPC
calls... bearing in mind that each operation will take a few milliseconds on average.  This
is assuming that you want to consume all the available NN RPC bandwidth from a single client

Perhaps I'm missing something, but I don't see how async operations will improve performance
here.  The overhead of a few thousand threads on the client is small, and certainly not what
is limiting HDFS performance.  Rather, performance is limited by considerations like the locking
on the NameNode, Java garbage collections on the NameNode, and serialization/deserialization

Please keep in mind that you don't need async operations to reuse connections and sockets...
we do that already via mechanisms like the {{PeerCache}} (formerly {{SocketCache}}).  Clearly,
Hive can also dispatch operations in parallel using standard mechanisms like an Executor or
ThreadPool.  I certainly don't object to implementing this, but if the goal is better performance,
I think you are going to be disappointed.  Perhaps I have missed something, though... I'm
curious if there are reasons for implementing this that I have not considered.

> [umbrella] Asynchronous HDFS Access
> -----------------------------------
>                 Key: HDFS-9924
>                 URL: https://issues.apache.org/jira/browse/HDFS-9924
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Xiaobing Zhou
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked until the
method returns.  It is very slow if a client makes a large number of independent calls in
a single thread since each call has to wait until the previous call is finished.  It is inefficient
if a client needs to create a large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is not blocked.
 The methods in the new API immediately return a Java Future object.  The return value can
be obtained by the usual Future.get() method.

This message was sent by Atlassian JIRA

View raw message