Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CEFC8200CC8 for ; Fri, 30 Jun 2017 00:12:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id CDE49160BFB; Thu, 29 Jun 2017 22:12:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 20520160BF7 for ; Fri, 30 Jun 2017 00:12:04 +0200 (CEST) Received: (qmail 73726 invoked by uid 500); 29 Jun 2017 22:12:04 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 73715 invoked by uid 99); 29 Jun 2017 22:12:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Jun 2017 22:12:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id CFBA1C16B7 for ; Thu, 29 Jun 2017 22:12:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.011 X-Spam-Level: X-Spam-Status: No, score=-100.011 tagged_above=-999 required=6.31 tests=[SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 0XDN9ks4VNoZ for ; Thu, 29 Jun 2017 22:12:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 2D0425FDAD for ; Thu, 29 Jun 2017 22:12:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 17832E0D55 for ; Thu, 29 Jun 2017 22:12:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 39A54245C6 for ; Thu, 29 Jun 2017 22:12:00 +0000 (UTC) Date: Thu, 29 Jun 2017 22:12:00 +0000 (UTC) From: "Chen Liang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-12043) Add counters for block re-replication MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 29 Jun 2017 22:12:06 -0000 [ https://issues.apache.org/jira/browse/HDFS-12043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-12043: ------------------------------ Attachment: HDFS-12043.003.patch Thanks [~arpitagarwal] for the comments! Post v003 patch to rename the metrics and added to {{if (pendingNum > 0)}} check. > Add counters for block re-replication > ------------------------------------- > > Key: HDFS-12043 > URL: https://issues.apache.org/jira/browse/HDFS-12043 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Chen Liang > Assignee: Chen Liang > Attachments: HDFS-12043.001.patch, HDFS-12043.002.patch, HDFS-12043.003.patch > > > We occasionally see that the under-replicated block count is not going down quickly enough. We've made at least one fix to speed up block replications (HDFS-9205) but we need better insight into the current state and activity of the block re-replication logic. For example, we need to understand whether is it because re-replication is not making forward progress at all, or is it because new under-replicated blocks are being added faster. > We should include additional metrics: > # Cumulative number of blocks that were successfully replicated. > # Cumulative number of re-replications that timed out. > # Cumulative number of blocks that were dequeued for re-replication but not scheduled e.g. because they were invalid, or under-construction or replication was postponed. > > The growth rate of of the above metrics will make it clear whether block replication is making forward progress and if not then provide potential clues about why it is stalled. > Thanks [~arpitagarwal] for the offline discussions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org