hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
Date Wed, 11 May 2016 18:52:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280617#comment-15280617
] 

Colin Patrick McCabe commented on HDFS-9924:
--------------------------------------------

With regard to error handling, why not handle all errors as exceptions thrown from {{Future#get}}?
 Handling some errors in a different way because they happened "earlier" (let's say, on the
client side rather than server side) forces the client to put error checking code in two places.

Does the {{Future#get}} callback get made without holding any locks?  Can other asynchronous
calls be made from this context?

{code}
public boolean rename(Path src, Path dst) throws IOException {
  if (isAsynchronousMode()) {
    return getFutureDistributedFileSystem().rename(src, dst).get();
  } else {
    ... //current implementation.
  }
}
{code}
It seems concerning that we would have to make such a large change to the synchronous {{DistributedFileSystem}}
code.  This would also result in more GC load since we'd be creating lots of {{Future}} objects.
 Shouldn't it be possible to avoid this?  I do not think having some kind of global async
bit is a good idea.

bq. In order to avoid client abusing the server by asynchronous calls. The RPC client should
have a configurable limit in order to limit the outstanding asynchronous calls. The caller
may be blocked if the number of outstanding calls hits the limit so that the caller is slowed
down.

Blocking the client seems like it could be problematic for code which expects to be asynchronous.
 There should be an option to throw an exception in this case.

I also think that we could maintain a queue of async calls that we have not submitted to the
IPC layer yet, to avoid being limited by issues at the IPC layer.

bq.­ Support asynchronous FileContext (client API)

{{AsynchronousFileSystem}} is a separate API from {{FileSystem}}.  If there are issues with
{{FileSystem}}, surely we can fix them in {{AsynchronousFileSystem}} rather than creating
a fourth API?

bq.­ Use Java 8’s new language feature in the API (client API).

Given that Hadoop 3.x will probably be Java 8 (based on the mailing list discussion), why
not just make the async API use jdk8's {{CompletableFuture}} from day 1, rather than hacking
it in later?

> [umbrella] Asynchronous HDFS Access
> -----------------------------------
>
>                 Key: HDFS-9924
>                 URL: https://issues.apache.org/jira/browse/HDFS-9924
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Xiaobing Zhou
>         Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked until the
method returns.  It is very slow if a client makes a large number of independent calls in
a single thread since each call has to wait until the previous call is finished.  It is inefficient
if a client needs to create a large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is not blocked.
 The methods in the new API immediately return a Java Future object.  The return value can
be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message