hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
Date Wed, 15 Jun 2016 21:52:09 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332670#comment-15332670

stack commented on HDFS-9924:

bq. Quoting Tsz Wo Nicholas Sze words, I understand your concern but it is a different problem.
We should not protect NN by making the client slow. We should add protection in NN instead

The above quote is magical-thinking (see the response to the above quote given by Daryn, an
operator of one of our largest deploys). We are talking branch-2 here for this Future hack.
The NN is not going to sprout scale of a sudden in the branch-2 line to support 'thousands'
of concurrent ops coming in from an adjacent, Hive metadata server blame-shifting. Some form
of parsimony, concern for NN loading, is in order.

Rereading this issue from the top down (including the design doc -- it needs numbers... what
is a large number of calls?; why wouldn't a thread pool work given you need to throttle) and
seeing where we have arrived, this issue is not about 'Asynchronous HDFS Access' as the summary
and original description advertises but instead is an expedient hack-for-hive, for late in
branch-2 only. The 'change' will have a short shelf-life it seems given it arrives in 2.9.0+
(?) and branch-3 is looking to be a different API (See discussion on HADOOP-12910).  The two
distinct positions I discern in the discussion so far -- those who want a true async API on
HDFS and those working on a hive fix -- are having trouble finding a common ground. If this
characterization is correct, I'd suggest lets just call this issue a hack-for-hive explicitly
and annotate it as such. A good few of the participants in this issue are likely not much
interested in the latter (e.g. myself) as long as this work does not get in the way of our
having a 'real' async API (HADOOP-12910) or confuse downstreamers on what the async story
on HDFS is.

> [umbrella] Asynchronous HDFS Access
> -----------------------------------
>                 Key: HDFS-9924
>                 URL: https://issues.apache.org/jira/browse/HDFS-9924
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Xiaobing Zhou
>         Attachments: AsyncHdfs20160510.pdf
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked until the
method returns.  It is very slow if a client makes a large number of independent calls in
a single thread since each call has to wait until the previous call is finished.  It is inefficient
if a client needs to create a large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is not blocked.
 The methods in the new API immediately return a Java Future object.  The return value can
be obtained by the usual Future.get() method.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message