hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6596) Improve InputStream when read spans two blocks
Date Mon, 30 Jun 2014 15:52:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047779#comment-14047779

Hadoop QA commented on HDFS-6596:

{color:green}+1 overall{color}.  Here are the results of testing the latest attachment 
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified
test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in hadoop-common-project/hadoop-common

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7252//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7252//console

This message is automatically generated.

> Improve InputStream when read spans two blocks
> ----------------------------------------------
>                 Key: HDFS-6596
>                 URL: https://issues.apache.org/jira/browse/HDFS-6596
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 2.4.0
>            Reporter: Zesheng Wu
>            Assignee: Zesheng Wu
>         Attachments: HDFS-6596.1.patch, HDFS-6596.2.patch, HDFS-6596.2.patch, HDFS-6596.2.patch,
HDFS-6596.3.patch, HDFS-6596.3.patch
> In the current implementation of DFSInputStream, read(buffer, offset, length) is implemented
as following:
> {code}
> int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
> if (locatedBlocks.isLastBlockComplete()) {
>   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
> }
> int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
> {code}
> From the above code, we can conclude that the read will return at most (blockEnd - pos
+ 1) bytes. As a result, when read spans two blocks, the caller must call read() second time
to complete the request, and must wait second time to acquire the DFSInputStream lock(read()
is synchronized for DFSInputStream). For latency sensitive applications, such as hbase, this
will result in latency pain point when they under massive race conditions. So here we propose
that we should loop internally in read() to do best effort read.
> In the current implementation of pread(read(position, buffer, offset, lenght)), it does
loop internally to do best effort read. So we can refactor to support this on normal read.

This message was sent by Atlassian JIRA

View raw message