hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3110) libhdfs implementation of direct read API
Date Mon, 02 Apr 2012 06:34:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244003#comment-13244003

Todd Lipcon commented on HDFS-3110:

+      if ((*env)->ExceptionCheck(env)) {
+        errno = errnoFromException(NULL, env, "JNIEnv::NewDirectByteBuffer");
+      }

>From my reading of the JNI spec, I think there should always be an exception pending if
it returns NULL. But, just in case, I think you should still set errno, so that the contract
for the user of this function is that errno is always set if it returns -1. Maybe ENOMEM?

errnoFromException: add an ENOTSUP case for UnsupportedOperationException, since that seems
quite likely


For "hot code path" functions like read, it may be a good optimization to use the JNI calls
directly, with cached methodids, rather than using the hdfsJniHelper functions. They seem
to be reasonably costly. But that's a separate optimization, so let's do it in another JIRA.

+          fprintf(stderr,
+                  "WARN: FSDataInputStream.read returned invalid return code - libhdfs returning
EOF, i.e., 0: %d\n",
+                  noReadBytes);
Please reformat to 80 columns

+        //This is a valid case: there aren't any bytes left to read!
I think this would be better as: "Returning -1 is valid -- it indicates EOF". And then change
the if condition to {{noReadyBytes != -1}}


Is there some way to change the code to automatically use hdfsReadDirect when the underlying
stream implements it? For example, when the file is opened, check whether it implements the
interface, and if so, stick a flag in the {{hdfsFile_internal}} struct. Or, if the interface
isn't enough (because of wrapped streams), have it try direct read on the first call to read,
and then set a flag and fallback to the slower read call if it gets UnsupportedOperationException?

I'm thinking that this is an implementation change, not an interface change, so making users
know about the new API is wrong.

Looks like the test shell script has a couple of hard tabs, please use two-space indentation
> libhdfs implementation of direct read API
> -----------------------------------------
>                 Key: HDFS-3110
>                 URL: https://issues.apache.org/jira/browse/HDFS-3110
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: libhdfs
>            Reporter: Henry Robinson
>            Assignee: Henry Robinson
>             Fix For: 0.24.0
>         Attachments: HDFS-3110.0.patch, HDFS-3110.1.patch
> Once HDFS-2834 gets committed, we can add support for the new API to libhdfs, which leads
to significant performance increases when reading local data from C.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message