hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henry Robinson (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2834) ByteBuffer-based read API for DFSInputStream
Date Mon, 05 Mar 2012 08:30:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222216#comment-13222216
] 

Henry Robinson commented on HDFS-2834:
--------------------------------------

Here are some initial benchmark numbers. They need a little explaining.

I ran 16 experiments total, changing the read path (copying or direct), the kind of checksum
used (native, non-native or none), the locality (shortcircuit or remote-to-same-machine),
although not in all combinations. All measurements are through libhdfs, which will explain
a couple of oddities in the performance. Once I get a little time, I'll try and do a native
Java benchmark, but the relative results should be quite similar. 

*Configuration*

The test reads the first 512MB of a 2GB file from a MiniDFSCluster running on the same machine.
Each configuration was run 50 times, and the first 5 runs were discarded; the remaining runs
were averaged. The file was read from buffer cache on a machine with 16GB RAM and 8 i7 cores.
You can see the code here: https://gist.github.com/1977470

*Read sizes*

The size of each read requested is an important variable. When performing a checksum, the
maximum read size is bounded by the number of checksums that BlockReadLocal can fit into its
internal buffer. Prior to this patch, this fixed the maximum read size in one go to 32k. In
a revision of this patch which I'll upload shortly, I've made this buffer size configurable.

I ran all the experiments in two configurations - with a 32k read buffer, and a 1MB one. The
size of the requested read was also fixed to 32k and 1MB respectively. When not performing
checksums, but doing a shortcircuit read, there is no limit on the size of a single read,
but for comparison these experiments were run with 32k and 1MB reads as well. 

Finally, remote reads are limited to 64k in size. Again, I ran the experiment with both read
sizes. The 1MB / copying read performance is extremely slow when performing a remote read.
This is because of the excessive amount of memory allocation happening inside libhdfs which
will allocate a 1MB byte[] for each 64k read. This illustrates one of the confounding effects
of measuring performance through libhdfs, and the dangers of not correctly matching your read
size to the size of the read the BlockReader implementation is able to return.

*Results*

(All values are throughput measured in MB/s)

||	||Native Checksums||	No Checksums||	Non-native Checksums||	Remote, Native Checksums
|Direct (MB/s) - 1MB buffer and request size|	3834.25	|4665.05|	867.06|	2057.17|
|Copying (MB/s) - 1MB buffer and request size|	1976.09	|1650.15|	754.97|	394.91|
|Direct (MB/s) - 32k buffer and request size|	2943.02	|3695.37|	816.22|	1925.03|
|Copying (MB/s) - 32k buffer and request size|	2010.21|	2290.50|	721.52|	1412.20|

... and in pretty picture form:

!hdfs-2834-libhdfs-benchmark.png!


                
> ByteBuffer-based read API for DFSInputStream
> --------------------------------------------
>
>                 Key: HDFS-2834
>                 URL: https://issues.apache.org/jira/browse/HDFS-2834
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Henry Robinson
>            Assignee: Henry Robinson
>         Attachments: HDFS-2834-no-common.patch, HDFS-2834.3.patch, HDFS-2834.4.patch,
HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.patch, HDFS-2834.patch, hdfs-2834-libhdfs-benchmark.png
>
>
> The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated {{byte[]}}.
Although for many clients this is desired behaviour, in certain situations, such as native-reads
through libhdfs, this imposes an extra copy penalty since the {{byte[]}} needs to be copied
out again into a natively readable memory area. 
> For these cases, it would be preferable to allow the client to supply its own buffer,
wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message