Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB67517911 for ; Wed, 6 May 2015 03:35:40 +0000 (UTC) Received: (qmail 39492 invoked by uid 500); 6 May 2015 03:35:40 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 39440 invoked by uid 500); 6 May 2015 03:35:40 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 39425 invoked by uid 99); 6 May 2015 03:35:40 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 May 2015 03:35:40 +0000 Date: Wed, 6 May 2015 03:35:40 +0000 (UTC) From: "Allen Wittenauer (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-6596) Improve InputStream when read spans two blocks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-6596: ----------------------------------- Labels: BB2015-05-TBR (was: ) > Improve InputStream when read spans two blocks > ---------------------------------------------- > > Key: HDFS-6596 > URL: https://issues.apache.org/jira/browse/HDFS-6596 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client > Affects Versions: 2.4.0 > Reporter: Zesheng Wu > Assignee: Zesheng Wu > Labels: BB2015-05-TBR > Attachments: HDFS-6596.1.patch, HDFS-6596.2.patch, HDFS-6596.2.patch, HDFS-6596.2.patch, HDFS-6596.3.patch, HDFS-6596.3.patch > > > In the current implementation of DFSInputStream, read(buffer, offset, length) is implemented as following: > {code} > int realLen = (int) Math.min(len, (blockEnd - pos + 1L)); > if (locatedBlocks.isLastBlockComplete()) { > realLen = (int) Math.min(realLen, locatedBlocks.getFileLength()); > } > int result = readBuffer(strategy, off, realLen, corruptedBlockMap); > {code} > From the above code, we can conclude that the read will return at most (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the caller must call read() second time to complete the request, and must wait second time to acquire the DFSInputStream lock(read() is synchronized for DFSInputStream). For latency sensitive applications, such as hbase, this will result in latency pain point when they under massive race conditions. So here we propose that we should loop internally in read() to do best effort read. > In the current implementation of pread(read(position, buffer, offset, lenght)), it does loop internally to do best effort read. So we can refactor to support this on normal read. -- This message was sent by Atlassian JIRA (v6.3.4#6332)