hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-2834) ByteBuffer-based read API for DFSInputStream
Date Fri, 02 Mar 2012 00:24:59 GMT

     [ https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Henry Robinson updated HDFS-2834:
---------------------------------

    Attachment: HDFS-2834.patch

Here's a patch against trunk for this ticket. Tests (including new ones) pass for me locally.


What I've done here is:

a) add a 'ByteBufferReadable' interface with a single read(ByteBuffer) API, and made FSDataInputStream
implement it
b) Implemented this interface in BlockReaderLocal (this is the most invasive change)
c) Implemented the interface in RemoteBlockReader2 (same number of copies as byte[] path),
and stubbed it out in RemoteBlockeReader
d) Added tests to TestShortCircuitLocalRead to exercise the BlockReaderLocal path
e) Split TestsParallelRead into a driver class plus a couple of test suites to exercise remote
or local rads
f) Added support to libhdfs for the direct read path. 

I'll circle back with benchmark numbers when I'm able (the ones that I've already run look
promising). 
                
> ByteBuffer-based read API for DFSInputStream
> --------------------------------------------
>
>                 Key: HDFS-2834
>                 URL: https://issues.apache.org/jira/browse/HDFS-2834
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Henry Robinson
>            Assignee: Henry Robinson
>         Attachments: HDFS-2834.patch
>
>
> The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated {{byte[]}}.
Although for many clients this is desired behaviour, in certain situations, such as native-reads
through libhdfs, this imposes an extra copy penalty since the {{byte[]}} needs to be copied
out again into a natively readable memory area. 
> For these cases, it would be preferable to allow the client to supply its own buffer,
wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message