hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6065) HDFS zero-copy reads should return empty buffer on EOF
Date Thu, 06 Mar 2014 19:56:52 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922976#comment-13922976

Chris Nauroth commented on HDFS-6065:

bq. Why return empty buffer vs raise EOF?

I agree with Colin.  Raw byte reading methods tend to return a sentinel value to indicate
EOF.  The {{EOFException}} tends to be used only for things like {{DataInputStream#readLong}},
where you might exhaust the stream before reading a full 8 bytes, and there is no reasonable
sentinel value to use in the range of possible return values.  The zero-copy API is a raw
byte reading interface.

The patch mostly looks good, but it turns out that it breaks a snippet of code that I've been
using as a simple example of using zero-copy:

  public int run(String[] args) {
    FileSystem fs = null;
    FSDataInputStream is = null;

    try {
      fs = FileSystem.get(this.getConf());
      is = fs.open(new Path(args[0]));
      ByteBufferPool bbp = args.length > 1 && Boolean.valueOf(args[1]) ?
        new ElasticByteBufferPool() : null;
      EnumSet<ReadOption> readOpts = EnumSet.of(ReadOption.SKIP_CHECKSUMS);
      for (;;) {
        ByteBuffer bb = is.read(bbp, BUFFER_MAX_LENGTH, readOpts);
        //if (bb == null) break; // EOF
        if (!bb.hasRemaining()) break; // EOF
        byte[] bytes = new byte[bb.remaining()];
        System.out.println(new String(bytes, "UTF-8"));
    } catch (IOException e) {
    } finally {
      IOUtils.cleanup(null, is, fs);

    return 0;

Note the commented out EOF line.  Previously, I used null to check for EOF, and that worked.
 With this patch, we stop getting nulls returned, so the code goes into an infinite loop.
 I needed to switch to checking {{hasRemaining}}.  This is only a problem when using fallback
buffers.  The existing code in {{ByteBufferUtil#fallbackRead}} returned null to indicate EOF:

    } finally {
      if (!success) {
        // If we got an error while reading, or if we are at EOF, we 
        // don't need the buffer any more.  We can give it back to the
        // bufferPool.
        buffer = null;

Considering zero-copy already shipped in 2.3.0, I think we need to consider backwards-compatibility.
 The solutions I can think of are:
# We play a bit of revisionist history and declare that my code sample was always buggy in
the first place.  :-)
# Change the code to keep returning null for EOF only if the caller is using fallback buffers.
 I don't like this very much, because the caller could be forced to check both null and hasRemaining
if their buffer handling code is decoupled from their reading code (i.e. background buffer
handling threads).
# Change this patch to use null instead of empty buffer for its sentinel value.

Overall, I favor #3 for consistency, even though I normally prefer to avoid null returns.
 Colin, what do you think?

> HDFS zero-copy reads should return empty buffer on EOF 
> -------------------------------------------------------
>                 Key: HDFS-6065
>                 URL: https://issues.apache.org/jira/browse/HDFS-6065
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.4.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-6065.001.patch
> It would be nice if the HDFS zero-copy reads mechanism returned an empty buffer on EOF.
 Currently, it throws UnsupportedOperationException on EOF when using zero-copy reads, which
seems confusing.

This message was sent by Atlassian JIRA

View raw message