hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4953) enable HDFS local reads via mmap
Date Sat, 07 Sep 2013 00:33:54 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760804#comment-13760804

Colin Patrick McCabe commented on HDFS-4953:

I've been thinking about this, and I think it might be possible to improve on the current

Maybe all we need is something like this:
in DFSInputStream:
  ZeroBuffer readZero(ByteBuffer fallback, int maxLength);

  implements Closeable (for close)
  implements eof() (returns true if there are no more bytes to read)
  implements all ByteBuffer methods by forwarding them to the enclosed ByteBuffer

This API would be implemented for every filesystem, not just HDFS.

The constraints here would be:
* maxLength >= 0
* you can't reuse a fallback buffer until you close the associated ZeroBuffer (we can enforce
this by throwing an exception in this case)
* ZeroBuffers are immutable once created-- until you call close on them.

This gets rid of a few of the awkward issues with the current API, which I think are:
* the current API requires users to special-case HDFS (since other FSes throw ZeroCopyUnavailableException)
* the current API shares the file position between the cursors and the stream, which is unintuitive.
* the current API puts the read call inside the cursor object, which is different than the
other read methods.
> enable HDFS local reads via mmap
> --------------------------------
>                 Key: HDFS-4953
>                 URL: https://issues.apache.org/jira/browse/HDFS-4953
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>    Affects Versions: 2.3.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>             Fix For: HDFS-4949
>         Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch, HDFS-4953.003.patch,
HDFS-4953.004.patch, HDFS-4953.005.patch, HDFS-4953.006.patch, HDFS-4953.007.patch, HDFS-4953.008.patch
> Currently, the short-circuit local read pathway allows HDFS clients to access files directly
without going through the DataNode.  However, all of these reads involve a copy at the operating
system level, since they rely on the read() / pread() / etc family of kernel interfaces.
> We would like to enable HDFS to read local files via mmap.  This would enable truly zero-copy
> In the initial implementation, zero-copy reads will only be performed when checksums
were disabled.  Later, we can use the DataNode's cache awareness to only perform zero-copy
reads when we know that checksum has already been verified.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message