Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 915B310240 for ; Mon, 6 Jan 2014 07:56:44 +0000 (UTC) Received: (qmail 62277 invoked by uid 500); 6 Jan 2014 07:56:17 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 62125 invoked by uid 500); 6 Jan 2014 07:56:13 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 61986 invoked by uid 99); 6 Jan 2014 07:56:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Jan 2014 07:56:01 +0000 Date: Mon, 6 Jan 2014 07:56:00 +0000 (UTC) From: "Binglin Chang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-4273) Problem in DFSInputStream read retry logic may cause early failure MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862833#comment-13862833 ] Binglin Chang commented on HDFS-4273: ------------------------------------- Hi LiuLei, this is indeed an issue. By my understanding, hbase mostly uses pread to read hfile data in multithread, but deadnodes is not per thread, hence a more broader issue is if we change deadnodes in one thread, it affects other threads' logic. This jira doesn't not fix this broader issue. I am thinking of a workaround, if a DN in deadNodes is local, we give it an expire time, after it expires, we remove it, this is not perfect but should solve the issue most of the time. > Problem in DFSInputStream read retry logic may cause early failure > ------------------------------------------------------------------ > > Key: HDFS-4273 > URL: https://issues.apache.org/jira/browse/HDFS-4273 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.0.2-alpha > Reporter: Binglin Chang > Assignee: Binglin Chang > Priority: Minor > Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, HDFS-4273.v4.patch, HDFS-4273.v5.patch, HDFS-4273.v6.patch, TestDFSInputStream.java > > > Assume the following call logic > {noformat} > readWithStrategy() > -> blockSeekTo() > -> readBuffer() > -> reader.doRead() > -> seekToNewSource() add currentNode to deadnode, wish to get a different datanode > -> blockSeekTo() > -> chooseDataNode() > -> block missing, clear deadNodes and pick the currentNode again > seekToNewSource() return false > readBuffer() re-throw the exception quit loop > readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures. > {noformat} > some issues of the logic: > 1. seekToNewSource() logic is broken because it may clear deadNodes in the middle. > 2. the variable "int retries=2" in readWithStrategy seems have conflict with MaxBlockAcquireFailures, should it be removed? -- This message was sent by Atlassian JIRA (v6.1.5#6160)