Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Fri, 25 Apr 2014 07:10:15 +0000 (UTC)
From: "Liang Xie (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12710501.1398409812847.179253.1398409815605@arcas>
In-Reply-To: <JIRA.12710501.1398409812847@arcas>
References: <JIRA.12710501.1398409812847@arcas>
Subject: [jira] [Created] (HDFS-6286) adding a timeout setting for local
 read io
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Liang Xie created HDFS-6286:
-------------------------------

             Summary: adding a timeout setting for local read io
                 Key: HDFS-6286
                 URL: https://issues.apache.org/jira/browse/HDFS-6286
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: hdfs-client
    Affects Versions: 2.4.0, 3.0.0
            Reporter: Liang Xie
            Assignee: Liang Xie


Currently, if a write or remote read requested into a sick disk, DFSClient.hdfsTimeout could help the caller have a guaranteed time cost to return back. but it doesn't work on local read. Take HBase scan for example,
DFSInputStream.read -> readWithStrategy -> readBuffer -> BlockReaderLocal.read ->  dataIn.read -> FileChannelImpl.read
if it hits a bad disk, the low read io probably takes tens of seconds,  and what's worse is, the "DFSInputStream.read" hold a lock always.
Per my knowledge, there's no good mechanism to cancel a running read io(Please correct me if it's wrong), so my opinion is adding a future around the read request, and we could set a timeout there, if the threshold reached, we can add the local node into deadnode probably...
Any thought?


--
This message was sent by Atlassian JIRA
(v6.2#6252)