hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5191) revisit zero-copy API in FSDataInputStream to make it more intuitive
Date Wed, 18 Sep 2013 23:08:51 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771369#comment-13771369

Chris Nauroth commented on HDFS-5191:

I reviewed the current API and caught up on all prior discussion here and in HDFS-4953.  [~cmccabe],
thank you for incorporating the feedback.  I'm going to focus on the interface here.  I'll
review the implementation details soon too.  I like to review APIs by writing a little code
against them.  This is what I came up with (imports trimmed for brevity):

class Main extends Configured implements Tool {
  private static final int BUFFER_MAX_LENGTH = 4 * 1024 * 1024; // 4 MB
  private static final Log LOG = LogFactory.getLog(Main.class);

  public int run(String[] args) {
    FileSystem fs = null;
    FSDataInputStream is = null;

    try {
      fs = FileSystem.get(this.getConf());
      is = fs.open(new Path(args[0]));
      ByteBufferFactory bbf = new SimpleByteBufferFactory();
      EnumSet<ReadOption> readOpts = EnumSet.noneOf(ReadOption.class);
      for (;;) {
        ByteBuffer bb = is.read(bbf, BUFFER_MAX_LENGTH, readOpts);
        if (bb == null) break; // EOF
        // Use data from the buffer here.
    } catch (IOException e) {
      LOG.error("Unexpected IOException", e);
    } finally {
      IOUtils.cleanup(LOG, is, fs);

    return 0;

  public static void main(String[] args) throws Exception {
    System.exit(ToolRunner.run(new Main(), args));

Can someone check that this code (written by someone looking at the API for the first time)
is correct?  If so, then that's a good sign that we're on the right track.  :-)  I think this
code sample shows that most of the earlier feedback has been addressed.  Specifically:

# It's a single interface for client read code for both the cached and uncached case.  I didn't
need to check flags or catch a special exception or downcast to an HDFS class to handle copying
vs. zero-copy.
# There is generally less code for the client to write, because there are fewer things that
the client needs to control.  (i.e. no {{setAllowShortReads}})
# I did not need to preallocate a fallback buffer that wouldn't be used in the mmap'd case.
# I did not need to catch exceptions for main flow control.
# The {{ByteBufferFactory}} interface would allow the client to control ownership of direct
buffers for fallback (and the {{SimpleByteBufferFactory}} ships a default implementation that
does this).
# There is no sharing of mutable state between implicitly connected objects (streams and cursors).
# It looks close to the kind of code sample [~owen.omalley] wanted to achieve in the comments
of HDFS-4953.
# There had been discussion of returning array of {{ByteBuffer}} vs. returning a single {{ByteBuffer}}
with possibility of short read.  My preference is for the latter.  The former would have required
me to write another nested loop.  I need to code for short read anyway in case the final read
before EOF doesn't match my desired read length.

I think the last remaining open question is around the need for a client to check if zero-copy
is available and potentially run a different code path.  One way we could achieve that now
is by adding a {{RejectingByteBufferFactory}} that always throws, and clients that want that
behavior can use it instead of {{SimpleByteBufferFactory}}.  Does anyone have a concrete use
case for this?  Without a concrete use case, it's hard to say if the interface is sufficient.

Here are some suggestions, mostly minor adjustments on top of the current patch:

# Shall we add overloads of {{FSDataInputStream#read}} that don't require the caller to pass
{{ByteBufferFactory}} (assumes caller wants a new {{SimpleByteBufferFactory}}) and don't require
the caller to pass read opts (assumes none)?
# Can we rename {{ByteBufferFactory}} to {{ByteBufferPool}}?  [~vicaya] had made a similar
suggestion.  "Factory" implies per-call instance creation and "pool" communicates that it
needs to control the lifecycle of instances.
# Can we move those classes to the .io sub-package?  They aren't coupled to anything in .fs.
# We need to fully document the expected contract of {{ByteBufferPool}} for implementers.
 Some factors to consider:
## Thread-safety - Is it required for the implementation to guarantee thread-safety internally?
 (I assume thread-safety is only required if the caller intends to use the same one from multiple
threads.  Whatever the answer, let's document it.)
## Null - Is null a legal return value?  Is it possible for callers to pass null arguments?
 (Ideally, null is forbidden, but whatever the answer, let's document it.)
## Bounds-checking - What should implementers do if given a negative length?  (Runtime exception?)
# When the {{FSDataInputStream}} is not {{HasEnhancedByteBufferAccess}}, we try to fallback
to a copying read.  However, this failed when I tested my sample code against a local file
system, because it ultimately still called {{FSDataInputStream#read(ByteBuffer)}}, which throws
{{UnsupportedOperationException}} in the base class.  We'll need to fix this to achieve the
goal of clients writing a single code path that works for any {{FileSystem}}.
> revisit zero-copy API in FSDataInputStream to make it more intuitive
> --------------------------------------------------------------------
>                 Key: HDFS-5191
>                 URL: https://issues.apache.org/jira/browse/HDFS-5191
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client, libhdfs
>    Affects Versions: HDFS-4949
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5191-caching.001.patch
> As per the discussion on HDFS-4953, we should revisit the zero-copy API to make it more
intuitive for new users.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message