Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B9C1118A31 for ; Fri, 31 Jul 2015 21:30:05 +0000 (UTC) Received: (qmail 47620 invoked by uid 500); 31 Jul 2015 21:30:05 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 47567 invoked by uid 500); 31 Jul 2015 21:30:05 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 47550 invoked by uid 99); 31 Jul 2015 21:30:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Jul 2015 21:30:05 +0000 Date: Fri, 31 Jul 2015 21:30:05 +0000 (UTC) From: "Allen Wittenauer (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649876#comment-14649876 ] Allen Wittenauer commented on HDFS-6682: ---------------------------------------- We have no insight into how old a given replication might have been hanging around so no way to really answer that question. We know it gets backed up during cascading DN failure events (thanks very slow NM memory checker+fast acting bad job+Linux OOM killer!), so I was always under the impression that it's just the whole queue is super busy vs. old ones never cleared. Rate might be useful to at least tell us if it is stuck and/or a project on how long the queue will remain behind. > Add a metric to expose the timestamp of the oldest under-replicated block > ------------------------------------------------------------------------- > > Key: HDFS-6682 > URL: https://issues.apache.org/jira/browse/HDFS-6682 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Akira AJISAKA > Assignee: Akira AJISAKA > Labels: metrics > Attachments: HDFS-6682.002.patch, HDFS-6682.003.patch, HDFS-6682.004.patch, HDFS-6682.005.patch, HDFS-6682.006.patch, HDFS-6682.patch > > > In the following case, the data in the HDFS is lost and a client needs to put the same file again. > # A Client puts a file to HDFS > # A DataNode crashes before replicating a block of the file to other DataNodes > I propose a metric to expose the timestamp of the oldest under-replicated/corrupt block. That way client can know what file to retain for the re-try. -- This message was sent by Atlassian JIRA (v6.3.4#6332)