hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zesheng Wu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-6596) Improve InputStream when read spans two blocks
Date Tue, 24 Jun 2014 06:46:25 GMT
Zesheng Wu created HDFS-6596:

             Summary: Improve InputStream when read spans two blocks
                 Key: HDFS-6596
                 URL: https://issues.apache.org/jira/browse/HDFS-6596
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: hdfs-client
    Affects Versions: 2.4.0
            Reporter: Zesheng Wu
            Assignee: Zesheng Wu

In the current implementation of DFSInputStream, read(buffer, offset, length) is implemented
as following:
int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
if (locatedBlocks.isLastBlockComplete()) {
  realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
>From the above code, we can conclude that the read will return at most (blockEnd - pos
+ 1) bytes. As a result, when read spans two blocks, the caller must call read() second time
to complete the request, and must wait second time to acquire the DFSInputStream lock(read()
is synchronized for DFSInputStream). For latency sensitive applications, such as hbase, this
will result in latency pain point when they under massive race conditions. So here we propose
that we should loop internally in read() to do best effort read.

In the current implementation of pread(read(position, buffer, offset, lenght)), it does loop
internally to do best effort read. So we can refactor to support this on normal read.

This message was sent by Atlassian JIRA

View raw message