hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
Date Tue, 14 Jun 2016 23:58:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330890#comment-15330890

Duo Zhang commented on HDFS-9924:

My concern is that, you can not tell people that hive is only compatible with hadoop-2.8.x,
For example, we set hbase to be compatible with hadoop-2.4+, so usually we will optimize for
all hadoop-2.4+ versions if possible instead of using a new feature only introduced in a newer

Here, a thread pool solution works for all hadoop-2.x versions. And it is not that terrible
to have 1MB stack size per thread... It is offheap, only increases 1MB VSZ, not RSS, RSS will
increase on demand. And you can set a smaller stack size if you like to reduce the overhead.

For the implementation, what [~stack] said above is the experience we got from our write-ahead-log
implementation. And for the hive case here, yes, you have a different pattern. But it is not
a good idea to wait on Futures sequentially. For example, you have request 0-99, and request
1 is blocked for a long time and request 2-99 are all failed. With your solution, you will
block on request 1 for a long time before resubmit the failed 2-99 request. This is a inherent
defect of lacking the support of callback. And a better solution is, sorry, but again, using
multiple threads. With a thread pool and {{CompletionService}}, you can (sometimes) get the
failed request first.

Hope this could help. Thanks.

> [umbrella] Asynchronous HDFS Access
> -----------------------------------
>                 Key: HDFS-9924
>                 URL: https://issues.apache.org/jira/browse/HDFS-9924
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Xiaobing Zhou
>         Attachments: AsyncHdfs20160510.pdf
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked until the
method returns.  It is very slow if a client makes a large number of independent calls in
a single thread since each call has to wait until the previous call is finished.  It is inefficient
if a client needs to create a large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is not blocked.
 The methods in the new API immediately return a Java Future object.  The return value can
be obtained by the usual Future.get() method.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message