Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 946A71824D for ; Tue, 29 Mar 2016 23:29:26 +0000 (UTC) Received: (qmail 59377 invoked by uid 500); 29 Mar 2016 23:29:26 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 59287 invoked by uid 500); 29 Mar 2016 23:29:26 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 58973 invoked by uid 99); 29 Mar 2016 23:29:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Mar 2016 23:29:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 93DBD2C1F68 for ; Tue, 29 Mar 2016 23:29:25 +0000 (UTC) Date: Tue, 29 Mar 2016 23:29:25 +0000 (UTC) From: "Arpit Agarwal (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-9901) Move disk IO out of the heartbeat thread MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-9901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217065#comment-15217065 ] Arpit Agarwal commented on HDFS-9901: ------------------------------------- I am also going to look at this change some more as I get time. > Move disk IO out of the heartbeat thread > ---------------------------------------- > > Key: HDFS-9901 > URL: https://issues.apache.org/jira/browse/HDFS-9901 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Reporter: Hua Liu > Assignee: Hua Liu > Attachments: 0001-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 0002-HDFS-9901-Move-block-validation-out-of-the-heartbeat.patch, 0003-HDFS-9901-Move-disk-IO-out-of-the-heartbeat-thread.patch, 0004-HDFS-9901-move-diskIO-out-of-the-heartbeat-thread.patch, 0005-HDFS-9901-Move-diskIO-out-of-heartbeat-thread.patch > > > During heavy disk IO, we noticed hearbeat thread hangs on checkBlock method, which checks the existence and length of a block before spins off a thread to do the actual transferring. In extreme cases, the heartbeat thread hang more than 10 minutes so the namenode marked the datanode as dead and started replicating its blocks, which caused more disk IO on other nodes and can potentially brought them down. > The patch contains two changes: > 1. Makes DF asynchronous when monitoring the disk by creating a thread that checks the disk and updates the disk status periodically. When the heartbeat threads generates storage report, it then reads disk usage information from memory so that the heartbeat thread won't get blocked during heavy diskIO. > 2. Makes the checks (which required disk accesses) in transferBlock() in DataNode into a separate thread so the heartbeat does not have to wait for this when heartbeating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)