hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4651) Benchmarking random reads with DFSIO
Date Wed, 12 Sep 2012 08:12:07 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453821#comment-13453821
] 

Konstantin Shvachko commented on MAPREDUCE-4651:
------------------------------------------------

The idea is to utilize HDFS positional read, which is defined by {{PositionedReadable}} and
allows to read a segment of data from a given position.
I propose three variants of such benchmarks:
# *Random read*. Randomly choose an offset in the range [0, fileSize] and read one buffer
of data from that random position. Repeat operation until a specified number of bytes is read.

Random read can occasionally read the same bytes twice.
# *Backward read* reads file in reverse order.
This is intended to read all bytes of the given file, but avoid reading any of them twice.
# *Skip read*. Starting from the beginning read one buffer of data, then jump ahead, and read
again. Repeat until either the specified number of bytes is read or the end of file is reached.
Skip read allows to avoid read-ahead. With sequential read data mostly comes from the system
block cache. Jumping ahead far enough will ensure that bytes are actually read from the storage
device.
                
> Benchmarking random reads with DFSIO
> ------------------------------------
>
>                 Key: MAPREDUCE-4651
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4651
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: benchmarks, test
>    Affects Versions: 1.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>
> TestDFSIO measures throughput of HDFS write, read, and append operations. It will be
useful to have an option to use it for benchmarking random reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message